How to cluster SAP ASCS/SCS with SIOS DataKeeper on VMware ESXi Servers

This article describes the steps you take to prepare the VMware infrastructure for installing and configuring a high-availability SAP ASCS/SCS instance on a Windows failover cluster by using SIOS DataKeeper as the replicated cluster storage.

Create the ASCS VMs

For SAP ASCS / SCS cluster, deploy two VMs on different ESXi Servers.

Based on your deployment type, the host names and the IP addresses of the scenario would be like:

SAP deployment

Host name roleHost nameStatic IP address
1st cluster node ASCS/SCS clusterpr1-ascs-1010.0.0.4
2nd cluster node ASCS/SCS clusterpr1-ascs-1110.0.0.5
Cluster Network Namepr1clust10.0.0.42
ASCS cluster network namepr1-ascscl10.0.0.43
ERS cluster network name (only for ERS2)pr1-erscl10.0.0.44

On each VM add an additional virtual disk. We will later mirror these disks with DataKeeper and use them as part of our cluster.

Add the Windows VMs to the domain

After you assign static IP addresses to the virtual machines, add the virtual machines to the domain.

Install and configure Windows failover cluster

Install the Windows failover cluster feature

Run this command on one of the cluster nodes:

PowerShell

Copy

# Hostnames of the Win cluster for SAP ASCS/SCS

$SAPSID = “PR1”

$ClusterNodes = (“pr1-ascs-10″,”pr1-ascs-11”)

$ClusterName = $SAPSID.ToLower() + “clust”

# Install Windows features.

# After the feature installs, manually reboot both nodes

Invoke-Command $ClusterNodes {Install-WindowsFeature Failover-Clustering, FS-FileServer -IncludeAllSubFeature -IncludeManagementTools }

Once the feature installation has completed, reboot both cluster nodes.

Test and configure Windows failover cluster

Copy

# Hostnames of the Win cluster for SAP ASCS/SCS

$SAPSID = “PR1”

$ClusterNodes = (“pr1-ascs-10″,”pr1-ascs-11”)

$ClusterName = $SAPSID.ToLower() + “clust”

# IP address for cluster network name 

$ClusterStaticIPAddress = “10.0.0.42”

# Test cluster

Test-Cluster –Node $ClusterNodes -Verbose

New-Cluster –Name $ClusterName –Node  $ClusterNodes –StaticAddress $ClusterStaticIPAddress -Verbose

Configure cluster cloud quorum

As you use Windows Server 2016 or 2019, we recommend configuring Azure Cloud Witness, as cluster quorum.

Run this command on one of the cluster nodes:

PowerShell

Copy

$AzureStorageAccountName = “cloudquorumwitness”

Set-ClusterQuorum –CloudWitness –AccountName $AzureStorageAccountName -AccessKey <YourAzureStorageAccessKey> -Verbose

Alternatively you can use a File Share Witness on a 3rd server in your environment. This server should be running on an 3rd ESXi host for redundancy. 

SIOS DataKeeper Cluster Edition for the SAP ASCS/SCS cluster share disk

Now, you have a working Windows Server failover clustering configuration. To install an SAP ASCS/SCS instance, you need a shared disk resource. One of the options is to use SIOS DataKeeper Cluster Edition.

Installing SIOS DataKeeper Cluster Edition for the SAP ASCS/SCS cluster share disk involves these tasks:

  • Install SIOS DataKeeper
  • Configure SIOS DataKeeper

Install SIOS DataKeeper

Install SIOS DataKeeper Cluster Edition on each node in the cluster. To create virtual shared storage with SIOS DataKeeper, create a synced mirror and then simulate cluster shared storage.

Before you install the SIOS software, create the DataKeeperSvc domain user.

Add the DataKeeperSvc domain user to the Local Administrator group on both cluster nodes.

  1. Install the SIOS software on both cluster nodes.
    SIOS installer
    Figure 31: First page of the SIOS DataKeeper installation
    First page of the SIOS DataKeeper installation
  2. In the dialog box, select Yes.
    Figure 32: DataKeeper informs you that a service will be disabled
    DataKeeper informs you that a service will be disabled
  3. In the dialog box, we recommend that you select Domain or Server account.
    Figure 33: User selection for SIOS DataKeeper
    User selection for SIOS DataKeeper
  4. Enter the domain account username and password that you created for SIOS DataKeeper.
    Figure 34: Enter the domain user name and password for the SIOS DataKeeper installation
    Enter the domain user name and password for the SIOS DataKeeper installation
  5. Install the license key for your SIOS DataKeeper instance.Figure 35: Enter your SIOS DataKeeper license key
    Enter your SIOS DataKeeper license key
  6. When prompted, restart the virtual machine.

Configure SIOS DataKeeper

After you install SIOS DataKeeper on both nodes, start the configuration. The goal of the configuration is to have synchronous data replication between the additional disks that are attached to each of the virtual machines.

  1. Start the DataKeeper Management and Configuration tool, and then select Connect Server.
    Figure 36: SIOS DataKeeper Management and Configuration tool
    SIOS DataKeeper Management and Configuration tool
  2. Enter the name or TCP/IP address of the first node the Management and Configuration tool should connect to, and, in a second step, the second node.
    Figure 37: Insert the name or TCP/IP address of the first node the Management and Configuration tool should connect to, and in a second step, the second node
    Insert the name or TCP/IP address of the first node the Management and Configuration tool should connect to, and in a second step, the second node
  3. Create the replication job between the two nodes.
    Figure 38: Create a replication job
    Create a replication job
    A wizard guides you through the process of creating a replication job.
  4. Define the name of the replication job.
    Figure 39: Define the name of the replication job
    Define the name of the replication job

    Define the base data for the node, which should be the current source node
  5. Define the name, TCP/IP address, and disk volume of the target node.
    Figure 41: Define the name, TCP/IP address, and disk volume of the current target node
    Define the name, TCP/IP address, and disk volume of the current target node
  6. Define the compression algorithms. In our example, we recommend that you compress the replication stream. Especially in resynchronization situations, the compression of the replication stream dramatically reduces resynchronization time. Compression uses the CPU and RAM resources of a virtual machine. As the compression rate increases, so does the volume of CPU resources that are used. You can adjust this setting later.
  7. Another setting you need to check is whether the replication occurs asynchronously or synchronously. When you protect SAP ASCS/SCS configurations, you must use synchronous replication.
    Figure 42: Define replication details
    Define replication details
  8. Define whether the volume that is replicated by the replication job should be represented to a Windows Server failover cluster configuration as a shared disk. For the SAP ASCS/SCS configuration, select Yes so that the Windows cluster sees the replicated volume as a shared disk that it can use as a cluster volume.
    Figure 43: Select Yes to set the replicated volume as a cluster volume
    Select Yes to set the replicated volume as a cluster volume
    After the volume is created, the DataKeeper Management and Configuration tool shows that the replication job is active.
    Figure 44: DataKeeper synchronous mirroring for the SAP ASCS/SCS share disk is active
    DataKeeper synchronous mirroring for the SAP ASCS/SCS share disk is active
    Failover Cluster Manager now shows the disk as a DataKeeper disk, as shown in Figure 45:
    Figure 45: Failover Cluster Manager shows the disk that DataKeeper replicated
    Failover Cluster Manager shows the disk that DataKeeper replicated

We don’t describe the DBMS setup in this article because setups vary depending on the DBMS system you use. We assume that high-availability concerns with the DBMS are addressed with the functionalities that different DBMS vendors support 

The installation procedures of SAP NetWeaver ABAP systems, Java systems, and ABAP+Java systems are almost identical. The most significant difference is that an SAP ABAP system has one ASCS instance. The SAP Java system has one SCS instance. The SAP ABAP+Java system has one ASCS instance and one SCS instance running in the same Microsoft failover cluster group. Any installation differences for each SAP NetWeaver installation stack are explicitly mentioned. You can assume that the rest of the steps are the same.

Install SAP with a high-availability ASCS/SCS instance

Important

If you use SIOS to present a shared disk, don’t place your page file on the SIOS DataKeeper mirrored volumes. 

Installing SAP with a high-availability ASCS/SCS instance involves these tasks:

  • Create a virtual host name for the clustered SAP ASCS/SCS instance.
  • Install SAP on the first cluster node.
  • Modify the SAP profile of the ASCS/SCS instance.

Create a virtual host name for the clustered SAP ASCS/SCS instance

  1. In the Windows DNS manager, create a DNS entry for the virtual host name of the ASCS/SCS instance.
    Important

    Figure 1: Define the DNS entry for the SAP ASCS/SCS cluster virtual name and TCP/IP address
    Define the DNS entry for the SAP ASCS/SCS cluster virtual name and TCP/IP address
  2. If you are using the new SAP Enqueue Replication Server 2, which is also a clustered instance, then you need to reserve in DNS a virtual host name for ERS2 as well.

    Figure 1A: Define the DNS entry for the SAP ASCS/SCS cluster virtual name and TCP/IP address
    Define the DNS entry for the SAP ERS2 cluster virtual name and TCP/IP address
  3. To define the IP address that’s assigned to the virtual host name, select DNS Manager > Domain.
    Figure 2: New virtual name and TCP/IP address for SAP ASCS/SCS cluster configuration
    New virtual name and TCP/IP address for SAP ASCS/SCS cluster configuration

Install the SAP first cluster node

  1. Execute the first cluster node option on cluster node A. Select:
    • ABAP system: ASCS instance number 00
    • Java system: SCS instance number 01
    • ABAP+Java system: ASCS instance number 00 and SCS instance number 01
  2. Follow the SAP described installation procedure. Make sure in the start installation option “First Cluster Node”, to choose “Cluster Shared Disk” as configuration option.

The SAP installation documentation describes how to install the first ASCS/SCS cluster node.

Modify the SAP profile of the ASCS/SCS instance

If you have Enqueue Replication Server 1, add SAP profile parameter enque/encni/set_so_keepalive as described below. The profile parameter prevents connections between SAP work processes and the enqueue server from closing when they are idle for too long. The SAP parameter is not required for ERS2.

  1. Add this profile parameter to the SAP ASCS/SCS instance profile, if using ERS1.
  2. Copy

enque/encni/set_so_keepalive = true

  1. For both ERS1 and ERS2, make sure that the keepalive OS parameters are set as described in SAP note 1410736.
  2. To apply the SAP profile parameter changes, restart the SAP ASCS/SCS instance.

Install the database instance

To install the database instance, follow the process that’s described in the SAP installation documentation.

Install the second cluster node

To install the second cluster, follow the steps that are described in the SAP installation guide.

Install the SAP Primary Application Server

Install the Primary Application Server (PAS) instance <SID>-di-0 on the virtual machine that you’ve designated to host the PAS.

Install the SAP Additional Application Server

Install an SAP Additional Application Server (AAS) on all the virtual machines that you’ve designated to host an SAP Application Server instance.

Test the SAP ASCS/SCS instance failover

For the outlined failover tests, we assume that SAP ASCS is active on node A.

  1. Verify that the SAP system can successfully failover from node A to node B Choose one of these options to initiate a failover of the SAP cluster group from cluster node A to cluster node B:
    • Failover Cluster Manager
    • Failover Cluster PowerShell
  2. PowerShell
  3. Copy

$SAPSID = “PR1”     # SAP <SID>

$SAPClusterGroup = “SAP $SAPSID”

Move-ClusterGroup -Name $SAPClusterGroup

  1. Restart cluster node A within the Windows guest operating system. This initiates an automatic failover of the SAP <SID> cluster group from node A to node B.
  2. Restart cluster node A from the vCenter. This initiates an automatic failover of the SAP <SID> cluster group from node A to node B.
  3. Verification

After failover, verify that SIOS DataKeeper is replicating data from source volume drive S on cluster node B to target volume drive S on cluster node A.
Figure 9: SIOS DataKeeper replicates the local volume from cluster node B to cluster node A
SIOS DataKeeper replicates the local volume from cluster node B to cluster node A

How to cluster SAP ASCS/SCS with SIOS DataKeeper on VMware ESXi Servers

Windows Server 8 Developer Preview will not support the Hyper-V Role while running on VMware Workstation…at least on my laptop

Unless someone knows a trick that I don’t, it doesn’t appear as if I will be able to test out some of the Hyper-V clustering features unless I identify some actual hardware for Windows 8. I had hoped that just maybe VMware Workstation 8 would be able to fool Windows 8 into thinking it was actually a physical server, but so far no dice. This article appears to indicate it will work if you have an Intel Nehalem or Intel Core i7 processor, but my two year old Intel Core 2 Duo T9500 doesn’t seem to be able to do the trick.

I added the hypervisor.cpuid.v0 = “FALSE” to the config file and I changed the CPU settings to use Intel VT –x/EPT as shown below.

But this is what I get when I try to enable the Hyper-V role.

Maybe it is time to invest in a new laptop?

Windows Server 8 Developer Preview will not support the Hyper-V Role while running on VMware Workstation…at least on my laptop

How to Install Service Packs into a Cluster while also Minimizing Planned Downtime

I answer this question often enough that I thought I should probably but a link to it in my blog.

http://support.microsoft.com/default.aspx/kb/174799?p=1

This article tells you everything you need to know. However, what you may not realize is that by following the instructions in the article you are minimizing the amount of planned downtime while also giving yourself the opportunity to “test” the update on one node before your upgrade both nodes. If the upgrade does not go well on the first node, at least the application is still running on the second node until you can figure out what went wrong.

This is just one of the side benefits that you get when you cluster at the application layer vs. clustering at the hypervisor layer. If this were simply a VM in an availability group, you would have to schedule downtime to complete the application upgrade and hope that it all went well as the only failback is to restore the VM from backup. As I discussed in earlier articles, there is a benefit to clustering at the hypervisor level, but you have to understand what you are giving up as well.

How to Install Service Packs into a Cluster while also Minimizing Planned Downtime

Are VMware’s vSphere Disaster Recovery Options Really Better than Microsoft’s options for Hyper-V?

Every time I read a blog post, or open a magazine article about virtualization and disaster recovery I see the same thing….VMware has a more robust DR solution than Microsoft. Well, I’d like to challenge that assumption. From the view where I sit, this is actually one of the areas where Microsoft has a major competitive advantage at the moment. Here is how I see it.

VMware Site Recovery Manager

This is an optional additional add on that rides on the back of Array based replication solutions. While the recovery point objective is good due to the array based replication, the RTO is measured in hours, not minutes. Add in the fact that moving back to the primary data center is a very manual procedure which basically requires that you re-create your jobs in the opposite direction; the complete end to end recovery operation of failover and failback could take the better part of a day or longer.

Microsoft Multi-Site Cluster

Virtual machine HA clustering is included with the free version of Hyper-V Server 2008 R2, as well as with Windows Server 2008 Enterprise and Datacenter editions. In order to do multi-site clusters, it requires array based replication or host based replication solutions that integrate with Windows Server Failover Clustering. With a multi-site cluster, failover is measured in minutes (just about the time it takes to start a VM) and can be used with array based replication solutions such as EMC SRDF CE or HP MSA CLX or the much less expensive host based replication solutions such as SteelEye DataKeeper Cluster Edition.

Not only is failover quick with Hyper-V multi-site clusters, measured in just a few minutes, failback is also quick and seamless as well. Add in support for Live Migrations or Quick Migration across Data Centers, I think this is one area that Microsoft actually has a much more robust solution than VMware. Maybe it does not included automated DR tests, but when you consider you can failover and failback all in under 10 minutes, maybe an actual DR test performed monthly would give you a much better indication of what to expect in an actual disaster?

If you want a Hyper-V solution more like SRM, then there is an option there as well, it is called Citrix Essential for Hyper-V. But much like SRM, it is an optional add-on feature and really doesn’t even match the RPO and RTO features that you can achieve with basic multi-site clusters for Hyper-V.

What do you think? Am I wrong or is there something I just don’t get? From my view, Hyper-V is heads and shoulders above vSphere in terms of disaster recovery features.

Are VMware’s vSphere Disaster Recovery Options Really Better than Microsoft’s options for Hyper-V?

Will Windows Failover Clusters (MSCS/WSFC) Become Obsolete?

I was recently asked whether MSCS/WSFC will become obsolete due to 3rd party HA solutions. I think there will always be a market for 3rd party HA solutions, but many of the enhancements delivered with Windows Server 2008 have reduced the need to explore alternate HA solutions.  I think the greater threat to MSCS/WSFC is HA solutions provided by the virtualization vendors, such as Microsoft’s Hyper-V failover clusters (which actually uses WSFC) and VMware HA.  These solutions provided by the virtualization platform provide protection in case of host failure, although they currently do not have visibility into the application that is running within the VM. 

The real question is what kind of failure do you want to protect against? If physical server failure is your primary concern, then in some cases where MSCS may have previously been deployed, you will see Hyper-V Clusters or VMware HA being deployed instead.  In other cases where MSCS/WSFC may have seemed like overkill or was incompatible with the OS or application, you will instead see clustered VMs being deployed because it is easy to install and it supports all applications and operating systems. The mere fact that more workloads will be running per physical server will make it imperative to have some kind of clustering solution so that the failure of a single server does not bring down your entire infrastructure. In many cases, this clustering solution will be provided by the virtualization vendor.

Hyper-V Clusters and VMware HA are easy to implement and have a broad range of support as the protected VM can be running any OS or application.  The tradeoff is that you lose the application level monitoring included with MSCS/WSFC.  There will always be a class of applications that need application awareness, so MSCS/WSFC or other HA solutions that manage application availability will always be needed to ensure that the application is available, not just the server itself. With that being said, MSCS/WSFC will not become obsolete, but you will see it deployed alongside other cluster solutions provided by the hypervisor vendors..

Will Windows Failover Clusters (MSCS/WSFC) Become Obsolete?

Hyper-V Live Migration across Data Centers

There has recently been a lot of press heralding VMware’s limited support for vMotion across Data Centers, or “long-distance vMotion” as I have seen it called. The details of the solution can be found on Cisco’s website here. While I think that is just great, I’d like to remind people that Microsoft Hyper-V has this same functionality today and has a lot less requirements and restrictions than VMware’s long-distance vMotion.

Where VMware has VMwareHA, vMotion and Site Recovery Manager (SRM) to take care of virtual machine availability, Microsoft provides the same functionality with Windows Server Failover Clustering and in fact in some cases goes beyond what VMware can provide in terms of virtual machine availability as I described in a previous post.

What I’d like to focus on today is Microsoft’s competitive offering to “long-distance vMotion”. To achieve the same functionality in Hyper-V, you simply deploy a multi-site Hyper-V cluster using Windows Server Failover Clustering and your favorite host or storage based replication solution that is certified to work in a Windows Server 2008 multi-site cluster. By doing this, you can use your existing network infrastructure and your existing storage infrastructure to do Live Migrations across data centers. As far as requirements, they really are the same as any multi-site cluster, except I would recommend that you span your subnets to avoid client reconnection issues that occur when moving a virtual machine to a new subnet, as the clients could cache to old IP address until the TTL expires.

A demonstration video of Live Migration across data centers using Windows Server 2008 R2 Hyper-V and SteelEye DataKeeper Cluster Edition can be seen here.

Hyper-V Live Migration across Data Centers

Making sense of virtualization availability options

With the recent release of Microsoft Windows Server 2008 R2 and vSphere 4.0, I thought it was a good time to review some of the options available when considering the availability of your virtual servers and the applications running on them. I also will take this opportunity to describe some of the features that enable virtual machine availability. I have grouped these features into their function roles to help highlight their purpose.

Planned Downtime

Live Migration and VMware’s VMotion are both solutions that allow an administrator to move a virtual machine from one physical server to another with no perceivable downtime. The key thing to remember about this technology is that in order to move a virtual machine from one server to another without any downtime, the move must be a planned event. The reason that it must be a planned event is that the virtual machine’s memory must be synchronized between the servers before the actual switchover can occur. This is true of both Microsoft’s and VMware’s solutions. Also keep in mind that both of these technologies require the use of shared storage to hold the virtual hard disks (VMDK and VHD files), which limits Live Migration and VMotion to local area networks. This also means that any downtime planned for the storage array must be handled in a different way if you want to limit the impact to your virtual machines.

Unplanned Downtime

Microsoft’s Windows Server Failover Clustering and VMware’s High Availability (HA) are the solutions that are available to protect virtual machines in the event of unplanned downtime. Both solutions are similar in that they monitor virtual machines for availability and in the case of a failure the VMs are moved to the standby node. This recovery process requires that the machines be rebooted since there was no time to sync the memory before failover.

Disaster Recovery

How do I recover my virtual machines in the event of a complete site loss? The good news is that virtualization makes this process a whole lot easier since a virtual machine is just a file that can be picked up and moved to another server. While up to this point VMware and Microsoft are pretty similar in their availability features and functionality, but here is where Microsoft really shines. VMware offers Site Recovery Manager which is a fine product, but is limited in support to only SRM-certified array-based replication solutions. Also, the failover and failback process is not trivial and can take the better part of a day to do a complete round trip from the DR site back to the primary data center. It does have some nice features like DR testing, but in my experience with Microsoft’s solution for disaster recovery they have a much better solution when it comes to disaster recovery.

Microsoft’s Hyper-V DR solution is Windows Server Failover Clustering in a multi-site cluster configuration (see video demonstration). In this configuration the performance and behavior is the same as a local area cluster, but yet it can span data centers. What this means is that you can actually move your virtual machines across data centers with little to no perceivable downtime. Failback is the same process, just point and click to move the virtual machine resource back to the primary data center. While there is no built in “DR Testing”, I think it is preferable to do an actual DR test in just the matter of a minute or two with no perceivable downtime. The other thing I like about WSFC multi-site clusters is that the replication options include not only array-based replication vendors, but also host-based replication vendors. This really gives you a wide range of replication solutions in all price ranges and does not require that you upgrade your existing storage infrastructure.

Fault Tolerance

Fault tolerance basically eliminates the need to reboot a virtual machine in the event of an unexpected failure. VMware has the edge here in that it offers VMware FT. There are a few other 3rd party hardware and software vendors that play in this space as well. There are plenty of limitations and requirements when it comes to implementing FT systems, but if you need to ensure that a hardware component failure results in zero downtime vs. the minute or two it takes to boot up a VM in a standard HA configuration, then this is an option that you may want to consider. You probably want to make sure that your existing servers are already chock full of hot standby CPUs, RAM, power supplies, etc, and you have redundant paths to the network and storage, otherwise you may be throwing good money after bad. Fault tolerance is great for protection from hardware failures, but what happens if your application or the virtual machine’s operating system is behaving badly? That is when you need application level clustering as described below.

Application Availability

Everything I have discussed up to this point really only takes into consideration the health of your physical servers and your virtual machines as a whole. This is all well and good, however, what happens if your virtual machine blue screens? Or what if that latest SQL service pack broke your application? In those cases, none of these solutions are going to do you one bit of good. For those most critical applications, you really must cluster at the application layer. What this means is that you must look into clustering solutions that run within the OS on the virtual machine vs. within the hypervisor. In the Microsoft world this means MSCS/WSFC or 3rd party clustering solutions. Your storage options, when clustering within the virtual machine, are limited in scope to either iSCSI targets or host-based replication solutions. A demonstration of SQL Server being clustered within a Hyper-V VM using SteelEye DataKeeper Cluster Edition is available here. Currently, VMware really does not have a solution to this problem and would defer to solutions that run within the virtual machine for application layer monitoring.

Summary

With the advent of virtualization, it is really not a question of if you need availability, but more of a question of what availability option will help meet your SLA and/or DR requirements. I hope that this information helps you make sense of the availability options available to you.

Making sense of virtualization availability options