How to Cluster MaxDB on Windows in the Cloud #Azure #AWS #GCP #SAP

Recently I have had a number of customers looking for a high availability solution for MaxDB on Windows in the cloud. Some customers have been in Azure and some in AWS. But regardless of the cloud platform, they all eventually find the post in the SAP Community WIKI that describes the process.

https://wiki.scn.sap.com/wiki/display/MaxDB/HowTo+-+Embed+SAP+MaxDB+in+MSCS

The challenge with this post in a cloud environment is that there is no shared storage (SAN) available in the Azure, AWS or GCP that allows you to build a traditional shared storage cluster. The beauty of HA in the cloud is that cluster nodes typically reside miles away from each other in another data center, AKA, availability zone (AZ). So even if shared storage was available, it wouldn’t make a lot of sense since it would have to reside in a single AZ, defeating the purpose of HA all together.

However, there is an answer. SIOS DataKeeper, a SANless clustering solution from SIOS technology, allows locally attached storage to be used in a Windows Server Failover Cluster, eliminating the need for a SAN. Instead, SIOS keeps locally attached disk in sync using synchronous block level replication technology and presents this storage to WSFC as a clustered disk resource called a DataKeeper volume.

DataKeeper and MaxDB
Typical 2-node WSFC across Availability Zones with a 3rd node in a different Region

As far as the cluster is concerned, a DataKeeper Volume cluster resource looks like a shared disk, but instead of controlling disk locking (SCSI reservations), it controls the mirror direction. So in every sense of the word it is still a true WSFC, except it uses locally attached storage instead of shared storage. The locally attached storage can be anything from EBS block device to Azure premium disk, or even a local Storage Space with multiple disks stripped together. As long as Windows sees an NTFS formatted volume with a drive letter and the volume size is the same on each instance it can be used in the cluster.

This type of cluster is commonly known as a SANless cluster and has been around for many years enabling geo-clusters and clusters where shared storage was not available. Database admins also love it as it enables them to use local high speed storage devices like PCIe flash or SSD drives, yet still use WSFC for high availability.

SIOS also supports asynchronous replication, so if you want to add a node in a different geographic location for disaster recovery you can build a 3-node cluster with 2 nodes in the same region but different fault domains and a 3rd node in an entirely different region, or maybe even back on-prem for disaster recovery options. Or, if you are in Azure you can leverage Azure SIte Recovery (ASR) for disaster recovery as SIOS DataKeeper is compatible for ASR.

Both WSFC and SIOS DataKeeper are very dependent upon IP addresses staying the same, so for ASR configurations you will want to make sure you retain your IP address upon failover as described here.

https://docs.microsoft.com/en-us/azure/site-recovery/site-recovery-retain-ip-azure-vm-failover

SIOS is no stranger to high availability and disaster recovery for SAP. The SIOS Protection Suite for Linux is a SAP Certified HA solution for SAP and SAP HANA and SIOS DataKeeper is the preferred HA/DR solution for SAP ASCS on Windows in cloud environments. Providing an HA/DR solution for MaxDB on Azure further solidifies SIOS as the SAP high availability experts.

If you have questions about high availability for SAP, Hana, MaxDB, SQL Server in Azure, AWS, GCP or any other platform leave me a comment or reach me on Twitter @daveberm

How to Cluster MaxDB on Windows in the Cloud #Azure #AWS #GCP #SAP

Step-by-Step: Configuring a File Server Cluster in Azure that Spans Availability Zones

In this post we will detail the specific steps required to deploy a 2-node File Server Failover Cluster that spans the new Availability Zones a single region of Azure. I will assume you are familiar with basic Azure concepts as well as basic Failover Cluster concepts and will focus this article on what is unique about deploying a File Server Failover Cluster in Azure across Availability Zones.  If your Azure region doesn’t support Availability Zones yet you will have to use Fault Domains instead as described in an earlier post.

With DataKeeper Cluster Edition you are able to take the locally attached Managed Disks, whether it is Premium or Standard Disks, and replicate those disks either synchronously, asynchronously or a mix or both, between two or more cluster nodes. In addition, a DataKeeper Volume resource is registered in Windows Server Failover Clustering which takes the place of a Physical Disk resource. Instead of controlling SCSI-3 reservations like a Physical Disk Resource, the DataKeeper Volume controls the mirror direction, ensuring the active node is always the source of the mirror. As far as Failover Clustering is concerned, it looks, feels and smells like a Physical Disk and is used the same way Physical Disk Resource would be used.

Pre-requisites

  • You have used the Azure Portal before and are comfortable deploying virtual machines in Azure IaaS.
  • Have obtained a license or eval license of SIOS DataKeeper

Deploying a File Server Failover Cluster Instance using the Azure Portal

To build a 2-node File Server Failover Cluster Instance in Azure, we are going to assume you have a basic Virtual Network based on Azure Resource Manager and you have at least one virtual machine up and running and configured as a Domain Controller. Once you have a Virtual Network and a Domain configured, you are going to provision two new virtual machines which will act as the two nodes in our cluster.

Our environment will look like this:

DC1 – Our Domain Controller and File Share Witness
SQL1 and SQL2 – The two nodes of our File Server Cluster. Don’t let the names confuse you, we are building a File Server Cluster in this guide. In my next post I will demonstrate a SQL Server cluster configuration.

Provisioning the two cluster nodes

Using the Azure Portal, we will provision both SQL1 and SQL2 exactly the same way.  There are numerous options to choose from including instance size, storage options, etc. This guide is not meant to be an exhaustive guide to deploying Servers in Azure as there are some really good resources out there and more published every day. However, there are a few key things to keep in mind when creating your instances, especially in a clustered environment.

Availability Zones – It is important that both SQL1, SQL2 reside in different Availability Zones. For the sake of this guide we will assume you are using Windows 2016 and will use a Cloud Witness for the Cluster Quorum. If you use Windows 2012 R2 or Windows Server 2008 R2 instead of Windows 2016 you will instead need to configure a File Share Witness in the 3rd Availability Zone as Cloud Witness was not introduced until Windows Server 2016.

By putting the cluster nodes in different Availability Zones we are ensuring that each cluster node resides in a different Azure datacenter in the same region. Leveraging Availability Zones rather than the older Fault Domains isolates you from the types of outages that occured just a few weeks ago that brought down the entire South Central region for multiple days.

Availability Zones
Be sure to add each cluster node to a different Availability Zone. If you leverage a File Share Witness it should reside in the 3rd Availability Zone.

Static IP Address

Once each VM is provisioned, you will want to go into the setting and change the settings so that the IP address is Static. We do not want the IP address of our cluster nodes to change.

Static IP
Make sure each cluster node uses a static IP

Storage

As far as Storage is concerned, you will want to consult Performance best practices for SQL Server in Azure Virtual Machines. In any case, you will minimally need to add at least one additional Managed Disk to each of your cluster nodes. DataKeeper can use Basic Disk, Premium Storage or even multiple disks striped together in a local Storage Space. If you do want to use a local Storage Space just be aware that you should create the Storage Space BEFORE you do any cluster configuration due to a known issue with Failover Clustering and local Storage Spaces. All disks should be formatted NTFS.

Create the Cluster

Assuming both cluster nodes (SQL1 and SQL2) have been provisioned as described above and added to your existing domain, we are ready to create the cluster. Before we create the cluster, there are a few Features that need to be enabled. These features are .Net Framework 3.5 and Failover Clustering. These features need to be enabled on both cluster nodes. You will also need to enable the FIle Server Role.

6
Enable both .Net Framework 3.5 and Failover Clustering features and the File Server on both cluster nodes

Once that role and those features have been enabled, you are ready to build your cluster. Most of the steps I’m about to show you can be performed both via PowerShell and the GUI. However, I’m going to recommend that for this very first step you use PowerShell to create your cluster. If you choose to use the Failover Cluster Manager GUI to create the cluster you will find that you wind up with the cluster being issued a duplicate IP address.

Without going into great detail, what you will find is that Azure VMs have to use DHCP. By specifying a “Static IP” when we create the VM in the Azure portal all we did was create sort of a DHCP reservation. It is not exactly a DHCP reservation because a true DHCP reservation would remove that IP address from the DHCP pool. Instead, this specifying a Static IP in the Azure portal simply means that if that IP address is still available when the VM requests it, Azure will issue that IP to it. However, if your VM is offline and another host comes online in that same subnet it very well could be issued that same IP address.

There is another strange side effect to the way Azure has implemented DHCP. When creating a cluster with the Windows Server Failover Cluster GUI, there is not option to specify a cluster IP address. Instead it relies on DHCP to obtain an address. The strange thing is, DHCP will issue a duplicate IP address, usually the same IP address as the host requesting a new IP address. The cluster install will complete, but you may have some strange errors and you may need to run the Windows Server Failover Cluster GUI from a different node in order to get it to run. Once you get it to run you will need to change the core cluster IP address to an address that is not currently in use on the network.

You can avoid that whole mess by simply creating the cluster via Powershell and specifying the cluster IP address as part of the PowerShell command to create the cluster.

You can create the cluster using the New-Cluster command as follows:

New-Cluster -Name cluster1 -Node sql1,sql2 -StaticAddress 10.0.0.100 -NoStorage

After the cluster creation completes, you will also want to run the cluster validation by running the following command. You should expect to see some warnings about storage and network, but that is expected in Azure and you can ignore those warnings. If any errors are reported you will need to address those before you move on.

Test-Cluster

Create a Quorum Witness

if you are running Windows 2016 or 2019 you will need to create a Cloud Witness for the cluster quorum. If you are running Windows Server 2012 R2 or 2008 R2 you will need to create a File Share Witness. The detailed instruction on witness creation can be found here.

Install DataKeeper

After the cluster is created it is time to install DataKeeper. It is important to install DataKeeper after the initial cluster is created so the custom cluster resource type can be registered with the cluster. If you installed DataKeeper before the cluster is created you will simply need to run the install again and do a repair installation.

8
Install DataKeeper after the cluster is created

During the installation you can take all of the default options.  The service account you use must be a domain account and be in the local administrators group on each node in the cluster.

9
The service account must be a domain account that is in the Local Admins group on each node

Once DataKeeper is installed and licensed on each node you will need to reboot the servers.

Create the DataKeeper Volume Resource

To create the DataKeeper Volume Resource you will need to start the DataKeeper UI and connect to both of the servers.
10Connect to SQL1
11

Connect to SQL2
12

Once you are connected to each server, you are ready to create your DataKeeper Volume. Right click on Jobs and choose “Create Job”
13

Give the Job a name and description.
14

Choose your source server, IP and volume. The IP address is whether the replication traffic will travel.
15

Choose your target server.
16

Choose your options. For our purposes where the two VMs are in the same geographic region we will choose synchronous replication. For longer distance replication you will want to use asynchronous and enable some compression.
17

By clicking yes at the last pop-up you will register a new DataKeeper Volume Resource in Available Storage in Failover Clustering.
18

You will see the new DataKeeper Volume Resource in Available Storage.
19

Create the File Server Cluster Resource

To create the File Server Cluster Resource we will use Powershell once again rather than the Failover Cluster interface. The reason being is that once again because the virtual machines are configured to use DHCP, the GUI based wizard will not prompt us to enter a cluster IP address and instead will issue a duplicate IP address. To avoid this we will use a simple powershell command to create the FIle Server Cluster Resource and specify the IP Address

Add-ClusterFileServerRole -Storage "DataKeeper Volume E" -Name FS2 -StaticAddress 10.0.0.101

Make note of the IP address you specify here. It must be a unique IP address on your network. We will use this same IP address later when we create our Internal Load Balancer.

Create the Internal Load Balancer

Here is where failover clustering in Azure is different than traditional infrastructures. The Azure network stack does not support gratuitous ARPS, so clients cannot connect directly to the cluster IP address. Instead, clients connect to an internal load balancer and are redirected to the active cluster node. What we need to do is create an internal load balancer. This can all be done through the Azure Portal as shown below.

You can use an Public Load Balancer if your client connects over the public internet, but assuming your clients reside in the same vNet, we will create an Internal Load Balancer. The important thing to take note of here is that the Virtual Network is the same as the network where your cluster nodes reside. Also, the Private IP address that you specify will be EXACTLY the same as the address you used to create the File Server Cluster Resource. Also, because we are using Availability Zones we will be creating a Zone Redundant Standard Load Balancer as shown in the picture below.

Load Balancer

After the Internal Load Balancer (ILB) is created, you will need to edit it. The first thing we will do is to add a backend pool. Through this process you will choose the two cluster nodes.

Backend Pools

The next thing we will do is add a Probe. The probe we add will probe Port 59999. This probe determines which node is active in our cluster.
probe

And then finally, we need a load balancing rule to redirect the SMB traffic, TCP port 445 The important thing to notice in the screenshot below is the Direct Server Return is Enabled. Make sure you make that change.

rules

Fix the File Server IP Resource

The final step in the configuration is to run the following PowerShell script on one of your cluster nodes. This will allow the Cluster IP Address to respond to the ILB probes and ensure that there is no IP address conflict between the Cluster IP Address and the ILB. Please take note; you will need to edit this script to fit your environment. The subnet mask is set to 255.255.255.255, this is not a mistake, leave it as is. This creates a host specific route to avoid IP address conflicts with the ILB.

# Define variables
$ClusterNetworkName = “” 
# the cluster network name (Use Get-ClusterNetwork on Windows Server 2012 of higher to find the name)
$IPResourceName = “” 
# the IP Address resource name 
$ILBIP = “” 
# the IP Address of the Internal Load Balancer (ILB)
Import-Module FailoverClusters
# If you are using Windows Server 2012 or higher:
Get-ClusterResource $IPResourceName | Set-ClusterParameter -Multiple @{Address=$ILBIP;ProbePort=59999;SubnetMask="255.255.255.255";Network=$ClusterNetworkName;EnableDhcp=0}
# If you are using Windows Server 2008 R2 use this: 
#cluster res $IPResourceName /priv enabledhcp=0 address=$ILBIP probeport=59999  subnetmask=255.255.255.255

Creating File Shares

You will find that using the File Share Wizard in Failover Cluster Manager does not work. Instead, you will simply create the file shares in Windows Explorer on the active node. Failover clustering automatically picks up those shares and puts them in the cluster.

Note that the”Continuous Availability” option of a file share is not supported in this configuration.

Conclusion

You should now have a functioning File Server Failover Cluster in Azure that spans Availability Zones. If you have ANY problems, please reach out to me on Twitter @daveberm and I will be glad to assist. If you need a DataKeeper evaluation key fill out the form at http://us.sios.com/clustersyourway/cta/14-day-trial and SIOS will send an evaluation key sent out to you.

Step-by-Step: Configuring a File Server Cluster in Azure that Spans Availability Zones

Azure Outage Post-Mortem – Part 1

The first official Post-Mortems are starting to come out of Microsoft in regards to the Azure Outage that happened last week. While this first post-mortem addresses the Azure DevOps outage specifically (previously known as Visual Studio Team Service, or VSTS), it gives us some additional insight into the breadth and depth of the outage, confirms the cause of the outage, and gives us some insight into the challenges Microsoft faced in getting things back online quickly. It also hints at some some features/functionality Microsoft may consider pursuing to handle this situation better in the future.

As I mentioned in my previous article, features such as the new Availability Zones being rolled out in Azure, might have minimized the impact of this outage. In the post-mortem, Microsoft confirms what I previously said.

The primary solution we are pursuing to improve handling datacenter failures is Availability Zones, and we are exploring the feasibility of asynchronous replication.

Until Availability Zones are rolled out across more regions the only disaster recovery options you have are cross-region, hybrid-cloud or even cross-cloud asynchronous replication. Software based #SANless clustering solutions available today will enable such configurations, providing a very robust RTO and RPO, even when replicating great distances.

When you use SaaS/PaaS solutions you are really depending on the Cloud Service Provider (CSPs) to have an iron clad HA/DR solution in place. In this case, it seems as if a pretty significant deficiency was exposed and we can only hope that it leads all CSPs to take a hard look at their SaaS/PaaS offerings and address any HA/DR gaps that might exist. Until then, it is incumbent upon the consumer to understand the risks and do what they can to mitigate the risks of extended outages, or just choose not to use PaaS/SaaS until the risks are addressed.

The post-mortem really gets to the root of the issue…what do you value more, RTO or RPO?

I fundamentally do not want to decide for customers whether or not to accept data loss. I’ve had customers tell me they would take data loss to get a large team productive again quickly, and other customers have told me they do not want any data loss and would wait on recovery for however long that took.

It will be impossible for a CSP to make that decision for a customer. I can’t see a CSP ever deciding to lose customer data, unless the original data is just completely lost and unrecoverable. In that case, a near real-time async replica is about as good as you are going to get in terms of RPO in an unexpected failure.

However, was this outage really unexpected and without warning? Modern satellite imagery and improvements in weather forecasting probably gave fair warning that there was going to be significant weather related events in the area.

With hurricane Florence bearing down on the Southeast US as I write this post, I certainly hope if your data center is in the path of the hurricane you are taking proactive measures to gracefully move your workloads out of the impacted region. The benefit of a proactive disaster recovery vs a reactive disaster recovery are numerous, including no data loss, ample time to address unexpected issues, and managing human resources such that employees can worry about taking care of their families, rather than spending the night at a keyboard trying to put the pieces back together again.

Again, enacting a proactive disaster recovery would be a hard decision for a CSP to make on behalf of all their customers, as planned migrations across regions will incur some amount of downtime. This decision will have to be put in the hands of the customer.

Slide 2.png
Hurricane Florence Satellite Image taken from the new GOES-16 Satellite, courtesy of Tropical Tidbits

So what can you do to protect your business critical applications and data? As I discussed in my previous article, cross-region, cross-cloud or hybrid-cloud models with software based #SANless cluster solutions are going to go a long way to address your HA/DR concerns, with an excellent RTO and RPO for cloud based IaaS deployments. Instead of application specific solutions, software based, block level volume replication solutions such SIOS DataKeeper and SIOS Protection Suite replicate all data, providing a data protection solution for both Linux and Windows platforms.

My oldest son just started his undergrad degree in Meteorology at Rutgers University. Can you imagine a day when artificial intelligence (AI) and machine learning (ML) will be used to consume weather related data from NOAA to trigger a planned disaster recovery migration, two days before the storm strikes? I think I just found a perfect topic for his Master’s thesis. Or better yet, have him and his smart friends at the WeatherWatcher LLC get funding for a tech startup that applies AI and ML to weather related data to control proactive disaster recovery events.

I think we are just at the cusp of  IT analytics solutions that apply advanced machine-learning technology to cut the time and effort you need to ensure delivery of your critical application services. SIOS iQ is one of the solutions leading the way in that field.

Batten down the hatches and get ready, Hurricane season is just starting and we are already in for a wild ride. If you would like to discuss your HA/DR strategy reach out to me on Twitter @daveberm.

Azure Outage Post-Mortem – Part 1

Help! I can’t connect to my SQL Server multi-subnet failover cluster

I get that kind of call or email from customers all the time. I have a generic response as follows…

This has everything you need to know.

They don’t go into great detail about what to do if your connection does not support multisubnetfailover=true. If your connection does NOT support that parameter, then set registerallprovidersip to false and cleanup DNS. That procedure is described best here.
I figure I get this question often enough I probably should just flesh out my response a bit, hence the reason for this post.
In general people just aren’t aware of how multi-subnet failover clusters work. Multi-subnet failover clustering support was added in Windows Server 2012 with the addition of the “OR” technology when defining cluster resource dependencies. This allowed people to allow a Cluster Name resource to be dependent upon IP Address x.x.x.x OR IP Address y.y.y.y.
x.x.x.x would be an a cluster IP resource valid in Subnet A and y.y.y.y would be a cluster IP address valid in Subnet B. Only one address will be online at any given time, whichever address was valid for the subnet the resource was currently running on.
Microsoft SQL Server started supporting this concept starting with SQL Server 2012 with both failover cluster instances (FCI) using 3-party SANless clustering solutions like SIOS DataKeeper and SQL Server Always On Availability Groups.
By default if you create a SQL Server multi-subnet failover cluster the cluster should be automatically configured optimally, including setting up the two IP addresses, adding two A records to DNS and setting the registerallprovidersIP to true. However, on the client end you need to tell it that you are connecting to a multi-subnet failover cluster, otherwise the connection won’t be made.

Configuring the client

Configuring the client is done by adding multisubnetfailover=true to the connection string. This Microsoft documentation is a great resource, but if you just search for multisubnetfailover=true you will find a lot of information about that setting.
However, not every application will support adding that to the connection string. If you find yourself in that situation you should ask your application vendor to add support for that or show you how to do it.
However, all is not lost if you find yourself in that situation. You will want to change the behavior of the cluster so that upon failover DNS is update so that the single A record associated with the cluster client access point is updated with the new IP address. This is in lieu of having two A records in DNS, one with each cluster IP address, which is the default behavior in an multi-subnet cluster.
This article reference SharePoint, you can ignore that, the rest of the article is pretty well written to describe the process you should follow.
The highlights of that article are as follows…
Get-ClusterResource “[Network Name]” | Set-ClusterParameter RegisterAllProvidersIP 0
After restarting the cluster-name-object (basically restarting the role) & cleaning up all “A” records manually (clean-up isn’t done automatically) we can see our old A-records are still in DNS so we’ll need to delete those manually.
In addition to those steps I’d advise you to reduce the TTL on the HostRecordTTL as described in this article.
The highlight of that article is as follows.
PS C:\> Get-ClusterResource -Name cluster1FS | Set-ClusterParameter -Name HostRecordTTL -Value 300
With a Value of 300 you could potentially be waiting up to 5 minutes for your clients to reconnect after a failover, or even longer if if have a large Active Directory infrastructure and AD replication takes some time to update all the DNS servers across your infrastructure.
You are going to want to figure out what the optimal TTL is to facilitate quick client reconnections without over burdening your DNS servers with a bunch of DNS Lookup requests.
This type of configuration is common in disaster recovery configurations where your DR site is in a different subnet. It is also very common in HA deployments in AWS because different Availability Zones are in different subnets.
Let me know if you have any questions. You can always reach me on Twitter @daveberm
Help! I can’t connect to my SQL Server multi-subnet failover cluster

Deploying a Highly Available File Server in Azure IaaS (ARM) with SIOS DataKeeper

In this post we will detail the specific steps required to deploy a 2-node File Server Failover Cluster in a single region of Azure using Azure Resource Manager. I will assume you are familiar with basic Azure concepts as well as basic Failover Cluster concepts and will focus this article on what is unique about deploying a File Server Failover Cluster in Azure.

With DataKeeper Cluster Edition you are able to take the locally attached storage, whether it is Premium or Standard Disks, and replicate those disks either synchronously, asynchronously or a mix or both, between two or more cluster nodes. In addition, a DataKeeper Volume resource is registered in Windows Server Failover Clustering which takes the place of a Physical Disk resource. Instead of controlling SCSI-3 reservations like a Physical Disk Resource, the DataKeeper Volume controls the mirror direction, ensuring the active node is always the source of the mirror. As far as Failover Clustering is concerned, it looks, feels and smells like a Physical Disk and is used the same way Physical Disk Resource would be used.

Pre-requisites

  • You have used the Azure Portal before and are comfortable deploying virtual machines in Azure IaaS.
  • Have obtained a license or eval license of SIOS DataKeeper

Deploying a File Server Failover Cluster Instance using the Azure Portal

To build a 2-node File Server Failover Cluster Instance in Azure, we are going to assume you have a basic Virtual Network based on Azure Resource Manager and you have at least one virtual machine up and running and configured as a Domain Controller. Once you have a Virtual Network and a Domain configured, you are going to provision two new virtual machines which will act as the two nodes in our cluster.

Our environment will look like this:

DC1 – Our Domain Controller and File Share Witness
SQL1 and SQL2 – The two nodes of our File Server Cluster

Provisioning the two cluster nodes (SQL1 and SQL2)

Using the Azure Portal, we will provision both SQL1 and SQL2 exactly the same way. There are numerous options to choose from including instance size, storage options, etc. This guide is not meant to be an exhaustive guide to deploying Servers in Azure as there are some really good resources out there and more published every day. However, there are a few key things to keep in mind when creating your instances, especially in a clustered environment.

Availability Set – It is important that both SQL1, SQL2 AND DC1 reside in the same availability set. By putting them in the same Availability Set we are ensuring that each cluster node and the file share witness reside in a different Fault Domain and Update Domain. This helps guarantee that during both planned maintenance and unplanned maintenance the cluster will continue to be able to maintain quorum and avoid downtime.

3
Figure 3 – Be sure to add both cluster nodes and the file share witness to the same Availability Set

Static IP Address

Once each VM is provisioned, you will want to go into the setting and change the settings so that the IP address is Static. We do not want the IP address of our cluster nodes to change.

4
Figure 4 – Make sure each cluster node uses a static IP

Storage

As far as Storage is concerned, you will want to consult Performance best practices for SQL Server in Azure Virtual Machines. In any case, you will minimally need to add at least one additional disk to each of your cluster nodes. DataKeeper can use Basic Disk, Premium Storage or even Storage Pools consisting of multiple disks in a storage pool. Just be sure to add the same amount of storage to each cluster node and configure it identically. Also, be sure to use a different storage account for each virtual machine to ensure that a problem with one Storage Account does not impact both virtual machines at the same time.

5
Figure 5 – make sure to add additional storage to each cluster node

Create the Cluster

Assuming both cluster nodes (SQL1 and SQL2) have been provisioned as described above and added to your existing domain, we are ready to create the cluster. Before we create the cluster, there are a few Features that need to be enabled. These features are .Net Framework 3.5 and Failover Clustering. These features need to be enabled on both cluster nodes. You will also need to enable the FIle Server Role.

6
Figure 6 – enable both .Net Framework 3.5 and Failover Clustering features and the File Server on both cluster nodes

Once that role and those features have been enabled, you are ready to build your cluster. Most of the steps I’m about to show you can be performed both via PowerShell and the GUI. However, I’m going to recommend that for this very first step you use PowerShell to create your cluster. If you choose to use the Failover Cluster Manager GUI to create the cluster you will find that you wind up with the cluster being issues a duplicate IP address.

Without going into great detail, what you will find is that Azure VMs have to use DHCP. By specifying a “Static IP” when we create the VM in the Azure portal all we did was create sort of a DHCP reservation. It is not exactly a DHCP reservation because a true DHCP reservation would remove that IP address from the DHCP pool. Instead, this specifying a Static IP in the Azure portal simply means that if that IP address is still available when the VM requests it, Azure will issue that IP to it. However, if your VM is offline and another host comes online in that same subnet it very well could be issued that same IP address.

There is another strange side effect to the way Azure has implemented DHCP. When creating a cluster with the Windows Server Failover Cluster GUI when hosts use DHCP (which they have to), there is not option to specify a cluster IP address. Instead it relies on DHCP to obtain an address. The strange thing is, DHCP will issue a duplicate IP address, usually the same IP address as the host requesting a new IP address. The cluster will usually complete, but you may have some strange errors and you may need to run the Windows Server Failover Cluster GUI from a different node in order to get it to run. Once you get it to run you will want to change the cluster IP address to an address that is not currently in use on the network.

You can avoid that whole mess by simply creating the cluster via Powershell and specifying the cluster IP address as part of the PowerShell command to create the cluster.

You can create the cluster using the New-Cluster command as follows:

New-Cluster -Name cluster1 -Node sql1,sql2 -StaticAddress 10.0.0.101 -NoStorage

After the cluster creation completes, you will also want to run the cluster validation by running the following command:

Test-Cluster

7
Figure 7 – The output of the cluster creation and the cluster validation commands

Create File Share Witness

Because there is no shared storage, you will need to create a file share witness on another server in the same Availability Set as the two cluster nodes. By putting it in the same availability set you can be sure that you only lose one vote from your quorum at any given time. If you are unsure how to create a File Share Witness you can review this article http://www.howtonetworking.com/server/cluster12.htm. In my demo I put the file share witness on domain controller. I have published an exhaustive explanation of cluster quorums at https://blogs.msdn.microsoft.com/microsoft_press/2014/04/28/from-the-mvps-understanding-the-windows-server-failover-cluster-quorum-in-windows-server-2012-r2/

Install DataKeeper

After the cluster is created it is time to install DataKeeper. It is important to install DataKeeper after the initial cluster is created so the custom cluster resource type can be registered with the cluster. If you installed DataKeeper before the cluster is created you will simply need to run the install again and do a repair installation.

8
Figure 8 – Install DataKeeper after the cluster is created

During the installation you can take all of the default options.  The service account you use must be a domain account and be in the local administrators group on each node in the cluster.

9
Figure 9 – the service account must be a domain account that is in the Local Admins group on each node

Once DataKeeper is installed and licensed on each node you will need to reboot the servers.

Create the DataKeeper Volume Resource

To create the DataKeeper Volume Resource you will need to start the DataKeeper UI and connect to both of the servers.
10Connect to SQL1
11

Connect to SQL2
12

Once you are connected to each server, you are ready to create your DataKeeper Volume. Right click on Jobs and choose “Create Job”
13

Give the Job a name and description.
14

Choose your source server, IP and volume. The IP address is whether the replication traffic will travel.
15

Choose your target server.
16

Choose your options. For our purposes where the two VMs are in the same geographic region we will choose synchronous replication. For longer distance replication you will want to use asynchronous and enable some compression.
17

By clicking yes at the last pop-up you will register a new DataKeeper Volume Resource in Available Storage in Failover Clustering.
18

You will see the new DataKeeper Volume Resource in Available Storage.
19

Create the File Server Cluster Resource

To create the File Server Cluster Resource we will use Powershell once again rather than the Failover Cluster interface. The reason being is that once again because the virtual machines are configured to use DHCP, the GUI based wizard will not prompt us to enter a cluster IP address and instead will issue a duplicate IP address. To avoid this we will use a simple powershell command to create the FIle Server Cluster Resource and specify the IP Address

Add-ClusterFileServerRole -Storage "DataKeeper Volume E" -Name FS2 -StaticAddress 10.0.0.201

Make note of the IP address you specify here. It must be a unique IP address on your network. We will use this same IP address later when we create our Internal Load Balancer.

Create the Internal Load Balancer

Here is where failover clustering in Azure is different than traditional infrastructures. The Azure network stack does not support gratuitous ARPS, so clients cannot connect directly to the cluster IP address. Instead, clients connect to an internal load balancer and are redirected to the active cluster node. What we need to do is create an internal load balancer. This can all be done through the Azure Portal as shown below.

First, create a new Load Balancer
30

You can use an Public Load Balancer if your client connects over the public internet, but assuming your clients reside in the same vNet, we will create an Internal Load Balancer. The important thing to take note of here is that the Virtual Network is the same as the network where your cluster nodes reside. Also, the Private IP address that you specify will be EXACTLY the same as the address you used to create the SQL Cluster Resource.
31

After the Internal Load Balancer (ILB) is created, you will need to edit it. The first thing we will do is to add a backend pool. Through this process you will choose the Availability Set where your SQL Cluster VMs reside. However, when you choose the actual VMs to add to the Backend Pool, be sure you do not choose your file share witness. We do not want to redirect SQL traffic to your file share witness.
32
33

The next thing we will do is add a Probe. The probe we add will probe Port 59999. This probe determines which node is active in our cluster.
34

And then finally, we need a load balancing rule to redirect the SMB traffic, TCP port 445 The important thing to notice in the screenshot below is the Direct Server Return is Enabled. Make sure you make that change.

445_ilb

Fix the File Server IP Resource

The final step in the configuration is to run the following PowerShell script on one of your cluster nodes. This will allow the Cluster IP Address to respond to the ILB probes and ensure that there is no IP address conflict between the Cluster IP Address and the ILB. Please take note; you will need to edit this script to fit your environment. The subnet mask is set to 255.255.255.255, this is not a mistake, leave it as is. This creates a host specific route to avoid IP address conflicts with the ILB.

# Define variables
$ClusterNetworkName = “” 
# the cluster network name (Use Get-ClusterNetwork on Windows Server 2012 of higher to find the name)
$IPResourceName = “” 
# the IP Address resource name 
$ILBIP = “” 
# the IP Address of the Internal Load Balancer (ILB)
Import-Module FailoverClusters
# If you are using Windows Server 2012 or higher:
Get-ClusterResource $IPResourceName | Set-ClusterParameter -Multiple @{Address=$ILBIP;ProbePort=59999;SubnetMask="255.255.255.255";Network=$ClusterNetworkName;EnableDhcp=0}
# If you are using Windows Server 2008 R2 use this: 
#cluster res $IPResourceName /priv enabledhcp=0 address=$ILBIP probeport=59999  subnetmask=255.255.255.255

Creating File Shares

You will find that using the File Share Wizard in Failover Cluster Manager does not work. Instead, you will simply create the file shares in Windows Explorer on the active node. Failover clustering automatically picks up those shares and puts them in the cluster.

Note that the”Continuous Availability” option of a file share is not supported in this configuration.

Conclusion

You should now have a functioning File Server Failover Cluster. If you have ANY problems, please reach out to me on Twitter @daveberm and I will be glad to assist. If you need a DataKeeper evaluation key fill out the form at http://us.sios.com/clustersyourway/cta/14-day-trial and SIOS will send an evaluation key sent out to you.

Deploying a Highly Available File Server in Azure IaaS (ARM) with SIOS DataKeeper

Deploying Microsoft SQL Server 2014 Failover Clusters in #Azure Resource Manager (ARM)

In this post we will detail the specific steps required to deploy a 2-node SQL Server Failover Cluster in a single region of Azure using Azure Resource Manager. I will assume you are familiar with basic Azure concepts as well as basic SQL Server Failover Cluster concepts and will focus this article on what is unique about deploying a SQL Server Failover Cluster in Azure Resource Manager. If you are still using Azure Classic and need to deploy a SQL Server Failover Cluster in Classic you should read my article “STEP-BY-STEP: HOW TO CONFIGURE A SQL SERVER FAILOVER CLUSTER INSTANCE (FCI) IN MICROSOFT AZURE IAAS #SQLSERVER #AZURE #SANLESS

Before we begin, you should familiarize yourself with the Windows Azure Article, High availability and disaster recovery for SQL Server in Azure Virtual Machines. In that article all of the HA options are outlined, including AlwaysOn AG, Database Mirroring, Log Shipping, Backup and Restore and finally Failover Cluster Instances. Assuming you have dismissed those other options due to the costs associated with Enterprise Edition of SQL Server or lack of features, we are focusing on the final option – SQL Server AlwaysOn Failover Cluster Instance (FCI).

As you read that article it becomes clear that the lack of cluster aware shared storage in Azure is an obstacle in deploying SQL Server Failover clusters. However, there are a few alternatives described in that article. We will focus on using SIOS DataKeeper, to provide the storage to be used in the cluster.

1Figure 1 Microsoft’s support policy for SQL Server Failover Clusters
https://azure.microsoft.com/en-us/documentation/articles/virtual-machines-windows-classic-sql-dr/

With DataKeeper Cluster Edition you are able to take the locally attached storage, whether it is Premium or Standard Disks, and replicate those disks either synchronously, asynchronous or a mix or both, between two or more cluster nodes. In addition, a DataKeeper Volume resource is registered in Windows Server Failover Clustering which takes the place of a Physical Disk resource. Instead of controlling SCSI-3 reservations like a Physical Disk Resource, the DataKeeper Volume controls the mirror direction, ensuring the active node is always the source of the mirror. As far as SQL Server and Failover Clustering is concerned, it looks, feels and smells like a Physical Disk and is used the same way Physical Disk Resource would be used.

Pre-requisites

The Easy Way to do a Proof-of-Concept

If you are familiar with Azure Resource Manager you know one of the great new features is the ability to use Deployment Templates to rapidly deploy applications consisting of interrelated Azure resources. Many of these templates are developed by Microsoft and are readily available in their community on Github as “Quickstart Templates”. Community members are also free to extend templates or to publish their own templates on GitHub. One such template entitled “SQL Server 2014 AlwaysOn Failover Cluster Instance with SIOS DataKeeper Azure Deployment Template” published by SIOS Technology completely automates the process of deploying a 2-node SQL Server FCI into a new Active Directory Domain.

To deploy this template it is as easy as clicking on the “Deploy to Azure” button in the template.

2
Figure 2- Visit https://github.com/SIOSDataKeeper/SIOSDataKeeper-SQL-Cluster to rapidly provision a 2-node SQL cluster

Deploying a SQL Server Failover Cluster Instance using the Azure Portal

While the automated Azure deployment template is a quick and easy way to get a 2-node SQL Server FCI upon and running quickly, there are some limitations. For one, it uses a 180 Day evaluation version of SQL Server, so you can’t use it in production unless you upgrade the SQL eval licenses. Also, it builds an entirely new AD domain so if you want to integrate with your existing domain you are going to have to build it manually.

To build a 2-node SQL Server Failover Cluster Instance in Azure, we are going to assume you have a basic Virtual Network based on Azure Resource Manager (not Azure Classic) and you have at least one virtual machine up and running and configured as a Domain Controller. Once you have a Virtual Network and a Domain configured, you are going to provision two new virtual machines which will act as the two nodes in our cluster.

Our environment will look like this:

DC1 – Our Domain Controller and File Share Witness
SQL1 and SQL2 – The two nodes of our SQL Server Cluster

Provisioning the two cluster nodes (SQL1 and SQL2)

Using the Azure Portal, we will provision both SQL1 and SQL2 exactly the same way. There are numerous options to choose from including instance size, storage options, etc. This guide is not meant to be an exhaustive guide to deploying SQL Server in Azure as there are some really good resources out there and more published every day. However, there are a few key things to keep in mind when creating your instances, especially in a clustered environment.

Availability Set – It is important that both SQL1, SQL2 AND DC1 reside in the same availability set. By putting them in the same Availability Set we are ensuring that each cluster node and the file share witness reside in a different Fault Domain and Update Domain. This helps guarantee that during both planned maintenance and unplanned maintenance the cluster will continue to be able to maintain quorum and avoid downtime.

3
Figure 3 – Be sure to add both cluster nodes and the file share witness to the same Availability Set

Static IP Address

Once each VM is provisioned, you will want to go into the setting and change the settings so that the IP address is Static. We do not want the IP address of our cluster nodes to change.

4
Figure 4 – Make sure each cluster node uses a static IP

Storage

As far as Storage is concerned, you will want to consult Performance best practices for SQL Server in Azure Virtual Machines. In any case, you will minimally need to add at least one additional disk to each of your cluster nodes. DataKeeper can use Basic Disk, Premium Storage or even Storage Pools consisting of multiple disks in a storage pool. Just be sure to add the same amount of storage to each cluster node and configure it identically.

5
Figure 5 – make sure to add additional storage to each cluster node

Create the Cluster

Assuming both cluster nodes (SQL1 and SQL2) have been provisioned as described above and added to your existing domain, we are ready to create the cluster. Before we create the cluster, there are a few Features that need to be enabled. These features are .Net Framework 3.5 and Failover Clustering. These features need to be enabled on both cluster nodes.

6
Figure 6 – enable both .Net Framework 3.5 and Failover Clustering features on both cluster nodes

Once those features have been enabled, you are ready to build your cluster. Most of the steps I’m about to show you can be performed both via PowerShell and the GUI. However, I’m going to recommend that for this very first step you use PowerShell to create your cluster. If you choose to use the Failover Cluster Manager GUI to create the cluster you will find that you wind up with the cluster being issues a duplicate IP address.

Without going into great detail, what you will find is that Azure VMs have to use DHCP. By specifying a “Static IP” when we create the VM in the Azure portal all we did was create sort of a DHCP reservation. It is not exactly a DHCP reservation because a true DHCP reservation would remove that IP address from the DHCP pool. Instead, this specifying a Static IP in the Azure portal simply means that if that IP address is still available when the VM requests it, Azure will issue that IP to it. However, if your VM is offline and another host comes online in that same subnet it very well could be issued that same IP address.

There is another strange side effect to the way Azure has implemented DHCP. When creating a cluster with the Windows Server Failover Cluster GUI when hosts use DHCP (which they have to), there is not option to specify a cluster IP address. Instead it relies on DHCP to obtain an address. The strange thing is, DHCP will issue a duplicate IP address, usually the same IP address as the host requesting a new IP address. The cluster will usually complete, but you may have some strange errors and you may need to run the Windows Server Failover Cluster GUI from a different node in order to get it to run. Once you get it to run you will want to change the cluster IP address to an address that is not currently in use on the network.

You can avoid that whole mess by simply creating the cluster via Powershell and specifying the cluster IP address as part of the PowerShell command to create the cluster.

You can create the cluster using the New-Cluster command as follows:

New-Cluster -Name cluster1 -Node sql1,sql2 -StaticAddress 10.0.0.101 -NoStorage

After the cluster creation completes, you will also want to run the cluster validation by running the following command:

Test-Cluster

7
Figure 7 – The output of the cluster creation and the cluster validation commands

Create File Share Witness

Because there is no shared storage, you will need to create a file share witness on another server in the same Availability Set as the two cluster nodes. By putting it in the same availability set you can be sure that you only lose one vote from your quorum at any given time. If you are unsure how to create a File Share Witness you can review this article http://www.howtonetworking.com/server/cluster12.htm. In my demo I put the file share witness on domain controller. I have published an exhaustive explanation of cluster quorums at https://blogs.msdn.microsoft.com/microsoft_press/2014/04/28/from-the-mvps-understanding-the-windows-server-failover-cluster-quorum-in-windows-server-2012-r2/

Install DataKeeper

After the cluster is created it is time to install DataKeeper. It is important to install DataKeeper after the initial cluster is created so the custom cluster resource type can be registered with the cluster. If you installed DataKeeper before the cluster is created you will simply need to run the install again and do a repair installation.

8
Figure 8 – Install DataKeeper after the cluster is created

During the installation you can take all of the default options.  The service account you use must be a domain account and be in the local administrators group on each node in the cluster.

9
Figure 9 – the service account must be a domain account that is in the Local Admins group on each node

Once DataKeeper is installed and licensed on each node you will need to reboot the servers.

Create the DataKeeper Volume Resource

To create the DataKeeper Volume Resource you will need to start the DataKeeper UI and connect to both of the servers.
10Connect to SQL1
11

Connect to SQL2
12

Once you are connected to each server, you are ready to create your DataKeeper Volume. Right click on Jobs and choose “Create Job”
13

Give the Job a name and description.
14

Choose your source server, IP and volume. The IP address is whether the replication traffic will travel.
15

Choose your target server.
16

Choose your options. For our purposes where the two VMs are in the same geographic region we will choose synchronous replication. For longer distance replication you will want to use asynchronous and enable some compression.
17

By clicking yes at the last pop-up you will register a new DataKeeper Volume Resource in Available Storage in Failover Clustering.
18

You will see the new DataKeeper Volume Resource in Available Storage.
19

 

Install the first cluster node

You are now ready to install your first node. The cluster installation will proceed just like any other SQL cluster that you have ever built. I have not copied ever screen shot, just a few to guide you along the way.
20212223

You see that the DataKeeper Volume Resource is recognized as an available disk resource, just as if it were a shared disk.
24

Make note of the IP address you select here. It must be a unique IP address on your network. We will use this same IP address later when we create our Internal Load Balancer.
25

Add the second node

After the first node installs successfully, you will start the installation on the second node using the “Add node to a SQL Server failover cluster” option. Once again, the install is pretty straight forward, just use standard best practices as you would any other SQL cluster installation.
26272829

Create the Internal Load Balancer

Here is where failover clustering in Azure is different than traditional infrastructures. The Azure network stack does not support gratuitous ARPS, so clients cannot connect directly to the cluster IP address. Instead, clients connect to an internal load balancer and are redirected to the active cluster node. What we need to do is create an internal load balancer. This can all be done through the Azure Portal as shown below.

First, create a new Load Balancer
30

You can use an Public Load Balancer if your client connects over the public internet, but assuming your clients reside in the same vNet, we will create an Internal Load Balancer. The important thing to take note of here is that the Virtual Network is the same as the network where your cluster nodes reside. Also, the Private IP address that you specify will be EXACTLY the same as the address you used to create the SQL Cluster Resource.
31

After the Internal Load Balancer (ILB) is created, you will need to edit it. The first thing we will do is to add a backend pool. Through this process you will choose the Availability Set where your SQL Cluster VMs reside. However, when you choose the actual VMs to add to the Backend Pool, be sure you do not choose your file share witness. We do not want to redirect SQL traffic to your file share witness.
32
33

The next thing we will do is add a Probe. The probe we add will probe Port 59999. This probe determines which node is active in our cluster.
34

And then finally, we need a load balancing rule to redirect the SQL Server traffic. In our example we used a Default Instance of SQL which uses port 1433. You may also want to add rules for 1434 or others depending upon your applications requirements. The important thing to notice in the screen shot below is the Direct Server Return is Enabled. Make sure you make that change.
35

Fix the SQL Server IP Resource

The final step in the configuration is to run the following PowerShell script on one of your cluster nodes. This will allow the Cluster IP Address to respond to the ILB probes and ensure that there is no IP address conflict between the Cluster IP Address and the ILB. Please take note; you will need to edit this script to fit your environment. The subnet mask is set to 255.255.255.255, this is not a mistake, leave it as is. This creates a host specific route to avoid IP address conflicts with the ILB.

# Define variables
$ClusterNetworkName = “” 
# the cluster network name (Use Get-ClusterNetwork on Windows Server 2012 of higher to find the name)
$IPResourceName = “” 
# the IP Address resource name 
$ILBIP = “” 
# the IP Address of the Internal Load Balancer (ILB)
Import-Module FailoverClusters
# If you are using Windows Server 2012 or higher:
Get-ClusterResource $IPResourceName | Set-ClusterParameter -Multiple @{Address=$ILBIP;ProbePort=59999;SubnetMask="255.255.255.255";Network=$ClusterNetworkName;EnableDhcp=0}
# If you are using Windows Server 2008 R2 use this: 
#cluster res $IPResourceName /priv enabledhcp=0 address=$ILBIP probeport=59999  subnetmask=255.255.255.255

Conclusion

You should now have a functioning SQL Server Failover Cluster Instance. If you have ANY problems, please reach out to me on Twitter @daveberm and I will be glad to assist. If you need a DataKeeper evaluation key fill out the form at http://us.sios.com/clustersyourway/cta/14-day-trial and SIOS will send an evaluation key sent out to you.

Deploying Microsoft SQL Server 2014 Failover Clusters in #Azure Resource Manager (ARM)

Step-by-Step: How to configure a SQL Server Failover Cluster Instance (FCI) in Microsoft Azure IaaS #SQLServer #Azure #SANLess

7/19/2016 Update – The steps below describe a deployment in Azure “Classic”. If you are deploying a SQL Cluster in Azure Resource Manager (ARM) then you should see my aerticle here. https://clusteringformeremortals.com/2016/04/23/deploying-microsoft-sql-server-2014-failover-clusters-in-azure-resource-manager-arm/

Before we begin, we are going to make some assumptions that you are at least slightly familiar with failover clustering and Microsoft Azure and have already signed up for an Azure account. Throughout this Step-by-Step guide we will refer to additional resources for additional reading. Included in this guide are screen shots and code examples. Azure is a rapidly developing product, so your experience may be different than that described, but you should be able to adapt and adjust as needed. I will attempt to keep this article up to date my adding additional comments as time progresses. The new Azure Portal is still in the Preview stage as of the writing of this article, so we will use the currently supported portal along with PowerShell in all of our examples.

At a high level, these are the following steps that need to be taken in order to create a highly available SQL Server deployment on Azure IaaS. If you already have a functioning domain in Azure IaaS you can skip items 1-3.

We will take a closer look at each of these steps below.

Overview

These instructions assume you want to create a highly available SQL Server deployment entirely within one Azure region. It is entirely possible to configure SQL Server clusters that span different geographic regions within Azure, or even Hybrid Cloud configurations that span from on premise to the Azure Cloud or visa-versa. It is not my intent to cover those types of configurations in this document. Instead, the configuration I will focus on the configuration is illustrated in in Figure 1.

Figure 1 – SQL Server Failover Cluster in Azure

This article will describe how to create a cluster that spans two different Fault Domains and Update Domains within an Azure region. Spanning different Fault Domains eliminates downtime associated with Unplanned Downtime. Spanning different Update Domains eliminates failures associated with Planned Downtime.

For additional overview information, you may want to watch the webinar I did on SQLTIPS that discusses this topic in detail. It can be viewed at http://www.mssqltips.com/sql-server-video/360/highly-available-sql-server-cluster-deployments-in-azure-iaas/

Create your Virtual Network

In order for this to work, you will need to have all of your VMs within a Virtual Network. Creating a Virtual Network is pretty straight forward. The screen shots below should help guide you through the process.

At this point I like to add the Google DNS Server address of 8.8.8.8. I have experienced weird connectivity issues when trying to download updates from Microsoft when using their default DNS servers. After we have downloaded all of the updates that these servers need we will come back and replace the DNS server IP address with the IP address of our AD controller. But for now, add 8.8.8.8 and all of your VMs provisioned in this Virtual Network will receive this as a DNS server via the DHCP service. This forum post describes the problem I have experienced without adding this DNS server entry. Before adding all of your servers to the domain I have found that you need to delete this 8.8.8.8 address and replace it with the IP address of the first domain controller that you create.

You will see that I created one subnet in this Virtual Network and labeled it Public. Later on when we create our VMs we will use the Public network. While Azure recently added support for multiple NICs per VM, I have found that adding multiple subnets and NICs to an Azure VM can be problematic. The main problem is that each NIC is automatically assigned a Gateway address, which can causes routing problems due to multiple Gateways being defined on the same server.

It will take a few minutes for your Virtual Network to be created.

Create a Cloud Service

Your VMs will all reside in the same “Cloud Service”. Good luck finding the definition of an Azure “Cloud Service”, since Azure overall is a “Cloud Service”. However, this is a very specific think specific to Azure IaaS that you need to create before you start deploying VMs. The screen shots below will walk you through the process.

Make sure you put the Cloud Service in the same Region as your Virtual Network.

Create a Storage Account

Before you begin provisioning VMs you must create a Storage Account. Follow the steps below to create a storage account.

Make sure you create a Storage Account in the same Location as your Virtual Network

Create your Azure VMs and Storage

If you have not downloaded and installed Azure PowerShell yet, do that now. Also, make sure you set your default subscription and CurrentStorageAccountName.

We will start with provisioning the first VM which will become the Domain Controller (DC). In our example, we will also use the DC as a file share witness, so we will create an Availability Set that will include the Domain Controller and the two nodes in the cluster. The following is an example script which will create the VM and assign it a “Static Address”.

$AVSet=”SQLHA”

$InstanceSize=”Large”

$VMName=”DC1″

$AdminName=” myadminaccount”

$AdminPassword=”mypassword”

$PrimarySubnet=”Public”

$PrimaryIP=”10.0.0.100″

$CloudService=”SQLFailover”

$VirtualNetwork=”Azure-East”

$ImageName=”a699494373c04fc0bc8f2bb1389d6106__Windows-Server-2012-R2-201412.01-en.us-127GB.vhd”

$image = Get-AzureVMImage -ImageName $ImageName

$vm = New-AzureVMConfig -Name $VMName -InstanceSize $InstanceSize -Image $image.ImageName –AvailabilitySetName $AVSet

Add-AzureProvisioningConfig –VM $vm -Windows -AdminUserName $AdminName -Password $AdminPassword

Set-AzureSubnet -SubnetNames $PrimarySubnet -VM $vm

Set-AzureStaticVNetIP -IPAddress $PrimaryIP -VM $vm

New-AzureVM -ServiceName $CloudService –VNetName $VirtualNetwork –VM $vm

Tech Note – I say “Static IP Address”, but it really just creates a DHCP “Request”. I call it a DHCP “Request” and not “Reservation” because it really is only a best effort request. If this server is offline and someone starts a new server, the DHCP server could hand out this address to someone else, making it unavailable when this server is turned on.

Once you create your 1st VM you are ready to create the two SQL VMs used in the cluster. You will see that I tried to make the script easy to use by allowing you to specify the different variables. I highlighted the variables you need to change for each VM.

$AVSet=”SQLHA”

$InstanceSize=”Large”

$VMName=”SQL1″

$AdminName=”myadminaccount”

$AdminPassword=”P@55w0rd”

$PrimarySubnet=”Public”

$PrimaryIP=”10.0.0.101″

$CloudService=”SQLFailover”

$VirtualNetwork=”Azure-East”

$ImageName=”a699494373c04fc0bc8f2bb1389d6106__Windows-Server-2012-R2-201412.01-en.us-127GB.vhd”

$image = Get-AzureVMImage -ImageName $ImageName

$vm = New-AzureVMConfig -Name $VMName -InstanceSize $InstanceSize -Image $image.ImageName –AvailabilitySetName $AVSet

Add-AzureProvisioningConfig –VM $vm -Windows -AdminUserName $AdminName -Password $AdminPassword

Set-AzureSubnet -SubnetNames $PrimarySubnet -VM $vm

Set-AzureStaticVNetIP -IPAddress $PrimaryIP -VM $vm

New-AzureVM -ServiceName $CloudService –VNetName $VirtualNetwork –VM $vm

Run the script once again to provision the 2nd cluster node

$AVSet=”SQLHA”

$InstanceSize=”Large”

$VMName=”SQL2″

$AdminName=” myadminaccount”

$AdminPassword=”mypassword”

$PrimarySubnet=”Public”

$PrimaryIP=”10.0.0.102″

$CloudService=”SQLFailover”

$VirtualNetwork=”Azure-East”

$ImageName=”a699494373c04fc0bc8f2bb1389d6106__Windows-Server-2012-R2-201412.01-en.us-127GB.vhd”

$image = Get-AzureVMImage -ImageName $ImageName

$vm = New-AzureVMConfig -Name $VMName -InstanceSize $InstanceSize -Image $image.ImageName –AvailabilitySetName $AVSet

Add-AzureProvisioningConfig –VM $vm -Windows -AdminUserName $AdminName -Password $AdminPassword

Set-AzureSubnet -SubnetNames $PrimarySubnet -VM $vm

Set-AzureStaticVNetIP -IPAddress $PrimaryIP -VM $vm

New-AzureVM -ServiceName $CloudService –VNetName $VirtualNetwork –VM $vm

You see that each of these VMs are all placed in the same Availability Set, which I called “SQLHA”. By placing the VMs in the same Availability Set you take advantage of Fault Domains and Update Domains as described here. http://blogs.technet.com/b/yungchou/archive/2011/05/16/window-azure-fault-domain-and-update-domain-explained-for-it-pros.aspx

Once you have created your VMs your Azure Portal should look like this.

Words of Wisdom about Fault Domains

Fault Domains are a great concept; however Microsoft dropped the ball by not guaranteeing (As of Jan 2015) that you will always get three fault domains per Availability Set. In fact, most of the time I only get two Fault Domains. If you wind up with just two Fault Domains you will want to consider putting your File Share Witness in a different region just to be 100% sure that you don’t have a majority of your cluster votes sitting in the same rack. Once Windows Server 10 is GA this will no longer be a problem as you will be able to use a Cloud Witness instead of a File Share Witness. If you would like to see three Fault Domains be the standard, follow this link and VOTE for that idea on Azure idea website.

Configure Active Directory

First we will connect to DC1 via RDP and enable active directory. Use the “Connect” button to download the RDP connection to DC1. Use the username and password that you specified when you created your Azure VM. Promote DC1 to a Domain Controller.

Insider Tip – I have also found that DNS resolution works best if you remove all DNS forwarders on the DNS server and just use root hints. Azure can sometime have problem resolving Microsoft web properties if you use their DNS servers are forwarders.

Figure 2 – Remove all forwarders for reliable name resolution

Create a Cluster

Once you have configured DC1 as a Domain Controller, you will connect to SQL1 and SQL2 and add them to the domain. However, before you do that, you will need to change the DNS Server of the Virtual Network to that of the DC1 Server (10.0.0.100) and reboot both SQL1 and SQL2. Once SQL1 and SQL2 have 100.0.0.100 as their DNS Server you will be able to join the domain.

Once you are joined to the domain you will have to complete steps illustrated below to create a SQL Server Failover Cluster Instance (FCI).

First, enable .Net 3.5 Framework on each node.

If you find that .Net Framework cannot be installed, refer to my tip about DNS.

Enable Failover Cluster

Now that .Net 3.5 is enabled, you will then need to enable the Failover Cluster Feature on both SQL1 and SQL2.

Validation

Once the cluster feature is enabled, you will need to create the cluster. The first step is to run cluster Validation. I am assuming you are familiar with clustering, so I will just post a few of the screen shots and make note of things you need to consider.

The Validation will complete, but there will be some warnings. Most of the warnings will be around storage. You can ignore those as we will be using replicated storage rather than a shared disk. Also, you may get a warning about the network. You can also ignore that warning as we know that Azure has network redundancy built in at the physical layer.

Create Cluster Access Point

11/24/2015 UPDATE – I have found that creating a cluster via Powershell avoids all this issues described in the GUI steps show below because you can specify the IP Address of the Cluster as part of the creation process. The two PowerShell commands below replace all the steps show in the GUI screen shots that follow in this section. Make sure the StaticIaddress parameter

Test-Cluster –Node Server1, Server2

New-Cluster –Name MyCluster –Node Server1, Server2 –StaticAddress 10.0.0.200

If you ran the Powershell Script above then you can skip the rest of this section and jump right to the next section on creating the file share witness.

I would advise creating the Click Finish to start the cluster creation process. First choose a name for the cluster.

You will see that there are some warnings if you click View Report. You can ignore the warning as we will be creating a File Share Witness.

You may get the following message after the cluster creates. “The action ‘Validate Configuration…’ did not complete.

Fix Cluster Access Point IP Address

The underlying issue here is that the cluster is not resolving the cluster name properly due to an IP address conflict. What you will find is that Azure DHCP actually gives out a duplicate IP address to the cluster computer object that you just created. This is just one of the weird Azure idiosyncrasies that you will have to deal with as shown below.

You may need to open the Failover Cluster GUI on SQL2 in order to connect. Once you are able to connect to the cluster you will see that the cluster grabbed the same IP address as one of the cluster nodes. This of course causes IP address conflicts.

What we need to do is change the 10.0.0.102 IP address to another IP address not used in this subnet.

You will see I picked 10.0.0.200 as my address. This address is NOT reserved in the DHCP scope as there is currently no way to control the DHCP scope or add reservations. I just pick an address at the upper end of the DHCP scope and make sure that I don’t provision enough VMs within this subnet to ever reach that IP address.

Now that the Cluster IP address is fixed you will be able to connect to the cluster using Failover Cluster Manager from either node.

Create File Share Witness

Next we will create a File Share Witness for the cluster quorum. For a complete description of cluster quorums read my blog post on MSDN press, http://blogs.msdn.com/b/microsoft_press/archive/2014/04/28/from-the-mvps-understanding-the-windows-server-failover-cluster-quorum-in-windows-server-2012-r2.aspx

The file share witness will be created on the Domain Controller. Essentially you need to create a file share on DC1 and give read/write permissions to the cluster computer account “sioscluster”. Make sure to make these changes to both the Share and Security permissions as shown below.

The following steps are done on DC1.

Create a new folder.

Make sure you search for Computer objects and pick the cluster computer object name, in our case, SIOSCLUSTER

Make sure you give it Change permissions.

You also need to change the Security to allow the cluster computer object Modify permissions on the folder.

Once you create the shared folder, you will add the File Share Witness using the Windows Server Failover Cluster interface on either of the nodes as shown below.

Install DataKeeper

DataKeeper Cluster Edition from SIOS Technology is needed in order to provide the replication and cluster integration that will allow you to build a failover cluster instance without shared storage. First, you will install DataKeeper Cluster Edition on both of the nodes of your cluster. Run through the setup as shown below.

For demonstration purposes I used the domain administrator account. The only requirement is that the user account used is in the local administrator group in each server.

Create a DataKeeper Volume Resource

After you install the software on each cluster node (SQL1 and SQL2) you are ready to create you first replicated volume resource. Launch the DataKeeper GUI on either node and follow the steps below to create a DataKeeper Volume Resource.

After you connect to both servers, click on the Server Overview Report. It should look like the following.

You’ll notice that you are connected to both servers, but there are no volumes listed. Next we will need to add additional storage to each cluster node. Do this through the Azure portal as shown below.

After you have added the additional volume to each VM and created a formatted partition, you DataKeeper GUI should look like this.

You are now ready to launch the Create a Job Wizard and create the DataKeeper Volume resource as shown below.

Create the job and give it a name and optional description.

Install SQL into the Cluster

Now that you have the cluster configured and a DataKeeper Volume in Available Storage, you are ready to begin the SQL Server Cluster Installation. This process is exactly the same as if you were installing a SQL Server Failover Cluster Instance using shared storage. Since the DataKeeper Replicated Volume resource is a Storage Class resource, failover clustering treats it like a Physical Disk resource. Follow the steps pictured below to install SQL Server into the cluster.

You can use SQL Server 2014 Standard Edition to build a 2-node Failover Cluster. In this scenario DataKeeper can also replicate Data to a 3rd node, but that node cannot be part of the cluster. If you want to create a 3+ node cluster you will need to use SQL Server 2014 Enterprise Edition. Earlier versions of SQL work perfectly fine as well. I have tested SQL 2008 through SQL 2014.

Before clicking Next, click on the Data Directories tab.

Once SQL is installed on the first node, you will then need to run the installation on the second node.

Create an Internal Load Balancer

Once the cluster is configured, you will need to create the internal load balancer(ILB) which will be used for all client access. Clients that connect to SQL Server will need to connect to the ILB instead of connecting directly to the cluster IP address. If you were to try to connect to the cluster directly at this point you would see that you cannot connect to the cluster from any remote system. Even SQL Server Management Studio will not be able to connect to the cluster directly at this point.

Run this Powershell Command from your local desktop to create your Internal Load Balancer (ILB).

# Define variables

$IP = “10.0.0.201” # IP address you want your Internal Load Balancer to use, this should be the same address as your SQL Server Cluster IP Address

$svc=”SQLFailover” # The name of your cloud service

$vmname1=”sql1″ #The name of the VM that is your first cluster node

$epname1=”sql1″ #This is the name you want to assign to the endpoint associated with first cluster node, use anything you like

$vmname2=”sql2″ #The name of the VM that is your second cluster node

$epname2=”sql2″ #This is the name you want to assign to the endpoint associated with second cluster node, use anything you like

$lbsetname=”ilbsetsqlha” #use whatever name you like, this name is insignificant

$prot=”tcp”

$locport=1433

$pubport=1433

$probeport=59999

$ilbname=”sqlcluster” #this is the name your clients connect to, it should coincide with you SQL cluster Name Resource

$subnetname=”Public” #the name of the Azure subnet where you want the internal load balancer to live

# Add Internal Load Balancer to the service

Add-AzureInternalLoadBalancer -InternalLoadBalancerName $ilbname -SubnetName $subnetname -ServiceName $svc –StaticVNetIPAddress $IP

# Add load balanced endpoint to the primary cluster node

Get-AzureVM -ServiceName $svc -Name $vmname1 | Add-AzureEndpoint -Name $epname1 -LBSetName $lbsetname -Protocol $prot -LocalPort $locport -PublicPort $pubport -ProbePort $probeport -ProbeProtocol tcp -ProbeIntervalInSeconds 10 –DirectServerReturn $true -InternalLoadBalancerName $ilbname | Update-AzureVM

# Add load balanced endpoint to the secondary cluster node

Get-AzureVM -ServiceName $svc -Name $vmname2 | Add-AzureEndpoint -Name $epname2 -LBSetName $lbsetname -Protocol $prot -LocalPort $locport -PublicPort $pubport -ProbePort $probeport -ProbeProtocol tcp -ProbeIntervalInSeconds 10 –DirectServerReturn $true -InternalLoadBalancerName $ilbname | Update-AzureVM

Assuming the script ran as planned, you should see the following output.

Update the Client Listener

Once the internal load balancer is created we will need to run a Powershell script on SQL1 to update the SQL Server Cluster IP address. The script references the Cluster Network name and the IP Resource Name. The pictures below show you were to find both of these names in Failover Cluster Manager.

The script below should be run on one of the cluster nodes. Make sure to launch Powershell ISE using Run as Administrator.

# This script should be run on the primary cluster node after the internal load balancer is created

# Define variables

$ClusterNetworkName = “Cluster Network 1” # the cluster network name

$IPResourceName = “SQL IP Address 1 (sqlcluster)” # the IP Address resource name

$CloudServiceIP = “10.0.0.201” # IP address of your Internal Load Balancer

Import-Module FailoverClusters

# If you are using Windows 2012 or higher, use the Get-Cluster Resource command. If you are using Windows 2008 R2, use the cluster res command which is commented out.

Get-ClusterResource $IPResourceName | Set-ClusterParameter -Multiple @{“Address”=”$CloudServiceIP”;”ProbePort”=”59999″;SubnetMask=”255.255.255.255″;”Network”=”$ClusterNetworkName”;”OverrideAddressMatch”=1;”EnableDhcp”=0}

# cluster res $IPResourceName /priv enabledhcp=0 overrideaddressmatch=1 address=$CloudServiceIP probeport=59999 subnetmask=255.255.255.255

Assuming your script ran as expected, the output should look like below. You see that in order for the changes to be applied, you will need to bring your cluster resource offline once and then bring it online.

Firewall

Open TCP port 59999, 1433 and 1434 are open on the firewall of each server.

Summary

Now that the cluster is created you can connect to the SQL Failover Cluster Instance via the Internal Load Balancer, using the name sqlcluster or directly to 10.0.0.201.

If you have any questions at all, I will be glad to give you a helping hand. Tweet me @daveberm and I’ll be sure to get in touch with you.

Step-by-Step: How to configure a SQL Server Failover Cluster Instance (FCI) in Microsoft Azure IaaS #SQLServer #Azure #SANLess