New Azure ILB feature allows you to build a multi-instance SQL Server Failover Cluster in #Azure

At Microsoft Ignite this past September, Microsoft made some announcements around Azure. One of these announcements was the general availability of multiple VIPs on internal load balancers. Why is this so important to a SQL Server DBA? Well, up until now if you want to deploy highly available SQL Server in Azure you were limited to a single SQL Server FCI per cluster or a single Availability Group listener.

This limitation forced you to deploy a new cluster for each instance of SQL Server you wanted to protect in a Failover Cluster. It also forced you to group all of your databases into a single Availability Group if you wanted automatic failover and client redirection in your AlwaysOn AG configuration.

Those restrictions have now been lifted with these new ILB features. In this post I am going to walk you through the process of deploying a SQL Server FCI in Azure that contains two SQL Server instances. In a future post I will walk you through the same process for SQL Server AlwaysOn AG.

Let’s start with a basic, single instance SQL Server FCI in Azure as I describe in my post Deploying Microsoft SQL Server 2014 Failover Clusters in Azure Resource Manager .

That post describes the process of creating the cluster, using DataKeeper to create the replicated volume resources used in the cluster, creating the Internal Load Balancer (ILB) and then fixing the SQL Server Cluster IP Resource to work with the ILB. If you want to skip that process and jumpstart your configuration you can always use the Azure Deployment Template that creates a 2-Node SQL Server FCI using SIOS DataKeeper

Assuming you now have a basic two node SQL Server FCI, the steps to add a 2nd named instance are as follows:

  1. Create another DataKeeper Volume Resource on another volume that is not currently being used. You may need to add additional disks to your Azure instance if you have no available volumes. As part of this volume creation process the new DataKeeper Volume resource will be registered in Available Storage in the cluster. Refer to the article referenced earlier for the details.
  2. Install a named instance of SQL Server on the first node, specifying the DataKeeper Volume that we just created as the storage location.
  3. “Add a node” to the cluster on the second node.
  4. Lock down the port number of this new named instance to a port that is not in use. In my example I use port 1440.

Next we have to adjust the ILB to redirect traffic to this second instance. Here are the steps you need to follow:

Add a frontend IP address that is identical to the SQL cluster IP address you used for the second instance of SQL Server as shown below.

1_-_add_frontend_ip_address

Next, we will need to add another probe since the instances could be running on different servers. As shown below, I added a probe that probes port 59998 (instead of the usual 59999). We will need to make sure the new rules reference this proble. We will also need to remember that port number since we will need to update IP address associated with this instance during the last step of this process.

2_-_add_probe

Now we need to add two new rules to the ILB to direct traffic destined for this 2nd instance of SQL. Of course we need to add a rule to redirect TCP port 1440 (the port I used for the named instance of SQL), but because we are now using named instances we will also need to have a port to support the SQL Server Browser Service, UDP Port 1434.

In the picture below depicting the rule for the SQL Server Browser Service, take note that the Front End IP Address is referencing the new FrontendIP address (10.0.0.201), UDP port 1434 for both the Port and Backend Port. In the pool you will need to specify the two servers in the cluster, and finally make sure you choose the new Health Probe we just created.

3_-_add_probe

We will now add a rule for TCP/1440. As show in the picture below, add a new rule for port TCP 1440, or whatever port locked down for the named instance of SQL Server. Again, be sure to choose the new FrontEnd IP Address and the new Health Probe (59998). Also, make sure the Floating IP (direct server return) is enabled.

4_-_Add_Probe.png

Now that the load balancer is configured, the final step is to run the PowerShell script to update the new Cluster IP address associated with this 2nd instance of SQL Server. This PowerShell script only needs to be run on one of the cluster nodes.

# Define variables

$ClusterNetworkName = “”

# the cluster network name (Use Get-ClusterNetwork on Windows Server 2012 of higher to find the name)

$IPResourceName = “”

# the IP Address resource name of the second instance of SQL Server

$ILBIP = “”

# the IP Address of the second instance of SQL, which should be the same as the new Frontend IP address as well

Import-Module FailoverClusters

# If you are using Windows Server 2012 or higher:

Get-ClusterResource $IPResourceName | Set-ClusterParameter -Multiple @{Address=$ILBIP;ProbePort=59998;SubnetMask="255.255.255.255";Network=$ClusterNetworkName;EnableDhcp=0}

# If you are using Windows Server 2008 R2 use this:

#cluster res $IPResourceName /priv enabledhcp=0 address=$ILBIP probeport=59998  subnetmask=255.255.255.255

You now have a fully functional multi-instance SQL Server FCI in Azure. Let me know if you have any questions.

New Azure ILB feature allows you to build a multi-instance SQL Server Failover Cluster in #Azure

Understanding the Windows Server Failover Cluster Quorum in Windows Server 2016

Before we look at the new quorum features in Windows Server 2016, I think it is important to know where we came from. In my previous post Understanding the Windows Server Failover Cluster Quorum in Windows Server 2012 R2 I went into some great detail regarding the history and evolution of the cluster quorum. I suggest you review that post to understand how the quorum works in Windows Server 2012 R2 and how the new features of Windows Server 2016 are going to make your cluster deployments even more resilient.

Cloud Witness

Let’s start with my favorite new feature, Cloud Witness. A Cloud Witness allows you to leverage Azure Blob Storage to act as a witness for your cluster. This witness would be in place of a Disk Witness or File Share Witness. The configuration of a Cloud Witness is extremely easy and from my experience costs next to nothing to host in Azure.  The only downside is that the cluster nodes will need to be able to communicate over the internet to with your Azure Blob Storage. Very often cluster nodes are forbidden to communicate over to the public internet, so you will need to coordinate with your security team if you want to enable a Cloud Witness.

There are many compelling reasons for using a Cloud Witness, but for me it makes most sense in three very specific environments: Failover Cluster in Azure, Branch Office Clusters, and Multisite Clusters. Let’s take a look at each of these scenarios to see how a Cloud Witness can help.

azure-cloud-witness

Figure 1 – The cloud witness storage account should always be configured Locally Redundant Storage (LRS)         

If you are moving to Azure (or really any cloud provider), you will want to make sure your deployments are highly available. If you are taking about SQL Server, File Servers, SAP or other workloads traditionally clustered with Windows Server Failover Clustering, you will need to use either a File Share Witness or a Cloud Witness, since a Disk Witness is not possible in Azure. With Windows Server 2012 R2 or Windows Server 2008 R2, you will need to use a File Share Witness. With Windows Server 2016 it is now possible to use a Cloud Witness. The advantage of a Cloud Witness is that you don’t have to maintain another Windows instance in Azure to host the File Share. Instead, Microsoft allows you to leverage Blob Storage.  This gives you a less expensive solution, one that is much easier to manage, and one that is more resilient.

When looking at cluster deployments in branch offices, cost and maintenance is always a consideration. If you think about a retail chain with hundreds or thousands of locations, having a SAN in each location can be cost prohibitive.  If you consider that each location might to run a two node Hyper-V cluster on a S2D Hyper-converged configuration or a 3rd party replication solution to host a number of virtual machines, a Cloud Witness helps the business avoid the cost of adding an additional physical server in each location to act as a File Share Witness or the cost of adding a SAN to each location.

And finally, when deploying a multisite cluster, the Cloud Witness eliminates the need for a 3rd data center to host the File Share Witness. Before the introduction of the Cloud Witness, best practice would dictate that the File Share Witness reside in a 3rd location. Access to a 3rd datacenter just to host a file share witness was not always feasibly and certainly introduced another layer of complexity. By placing using a Cloud Witness you eliminate the need to maintain a 3rd location and access to the witness is done over the public internet, minimizing the network requirements as well.

Site Awareness

When building a multisite cluster, there has always been another common problem. Controlling the failover to always prefer the local site was not possible. While you could specify Preferred Owners, the Preferred Owners setting is commonly misunderstood and administrators may not have realized that even if they didn’t list a server as Preferred Owner, that server is automatically appended to the end of the Preferred Owners list maintained by the cluster. The result of this misunderstanding is that although you may have only listed the local servers as Preferred Owners, you could potentially have a cluster resource failover to the DR site even though there is a perfectly good node available in the local site. Obviously this is not what you expect and using Site Awareness will eliminate this problem moving forward.

Site Awareness fixes this problem by always preferring the local site when deciding which node to bring online. So in a normal circumstance a clustered workload will always failover to a local node unless you have a complete site outage, in which case one of the DR nodes will come online. The same holds true once you are running in the DR site, the cluster will recover the workload on a server in the DR site if it was previously running on a node in the DR site. Site Awareness will always prefer a local node.

Fault Domains

Building upon site awareness is Fault Domains. Fault Domains goes a step further and lets you define Node, Chasse, and Rack locations in addition to Site. Fault Domains have three benefits: Storage Affinity in a Stretch Cluster, increases Storage Spaces resiliency, and enhances the Health Services alerts by including meta data about the location of the associated resources raising the alarm. Storage Affinity will help ensure that your cluster workloads and storage are running in the same location. You certainly wouldn’t want your VM reading and writing data that is sitting on a CSV in a different city if you can help it.

However, I think the biggest winner here is the Storage Spaces Direct (S2D) scenario.  SD2 will leverage the information you provide about your cluster nodes location (Site, Rack, Chassis) to ensure that the multiple copies of data that is written for redundancy all live in different Fault Domains. This helps ensure that data placement is optimized so that the failure of a single Node, Chassis, Rack or Site does not bring down your entire S2D deployment.  Cosmos Darwin has an excellent video on Channel 9 that explains this concept in great detail.

Summary

Windows Server 2016 adds several new enhancements to the cluster quorum that will provide some immediate benefits to your cluster deployments. In addition to the enhancements that impact the cluster quorum, you will also want to check out some of the other great new cluster enhancements like rolling system upgrade, Virtual Machine Resiliency, Workgroup and Multi-Domain Clusters and others.

Understanding the Windows Server Failover Cluster Quorum in Windows Server 2016

Azure Auto Shutdown for Virtual Machines

If you are like me, I try to make my Azure MSDN subscription credits stretch the entire month. I’m typically just building labs to try out new features or to demonstrate SQL Server Failover Clusters in Azure. A lot of the time I am testing some pretty large instance sizes with plenty of premium storage. As you can imagine, you can burn through $150 pretty quick with a few GS5 instances running.

I try to be mindful and shutdown or destroy instances once I am done with them, but occasionally I’ll get pulled away for other business, only to log in the next day and see my credit has expired because I forgot to turn off the VMs.

I’m happy to see that it is now very easy to configure an automatic Shutdown of instances right in the Azure Portal.

Auto_Shutdown.png

Keep in mind however that this just shuts down the instance. If you have premium storage attached to it you will continue to pay for the Storage even if the instance is shut down.

Azure Auto Shutdown for Virtual Machines

MS SQL Server v.Next on Linux with Replication and High Availability #Azure #Cloud #Linux

With Microsoft’s recent release of the first public preview of MS SQL Server running on Linux, I wondered what they would do for high availability. Knowing how tightly coupled AlwaysOn Availability Groups and Failover Clustering is to the Windows operating system I was pretty certain they would not be options and I was correct.

Well, the people over at LinuxClustering.Net answered my question on how to provide high availability failover clusters for MS SQL Server v.Next on Linux with this great Step by Step article.

http://www.linuxclustering.net/2016/11/18/step-by-step-sql-server-v-next-for-linux-public-preview-high-availability-azure/

Not only that, they did it all in Azure which we know can be tricky given some of the network limitations.

sql-dependencies-created

I’d be curious to know if you are excited about SQL Server on Linux or if you think it is just a little science experiment. If you are excited, what does SQL Server on Linux bring to the table that open source databases don’t? If you like SQL Server that much why not just run it on Windows?

I’m not being facetious here, I honestly want to know what excites you about SQL Server on Linux. I’m looking forward to your comments.

MS SQL Server v.Next on Linux with Replication and High Availability #Azure #Cloud #Linux

Deploying a Highly Available File Server in Azure IaaS (ARM) with SIOS DataKeeper

In this post we will detail the specific steps required to deploy a 2-node File Server Failover Cluster in a single region of Azure using Azure Resource Manager. I will assume you are familiar with basic Azure concepts as well as basic Failover Cluster concepts and will focus this article on what is unique about deploying a File Server Failover Cluster in Azure.

With DataKeeper Cluster Edition you are able to take the locally attached storage, whether it is Premium or Standard Disks, and replicate those disks either synchronously, asynchronously or a mix or both, between two or more cluster nodes. In addition, a DataKeeper Volume resource is registered in Windows Server Failover Clustering which takes the place of a Physical Disk resource. Instead of controlling SCSI-3 reservations like a Physical Disk Resource, the DataKeeper Volume controls the mirror direction, ensuring the active node is always the source of the mirror. As far as Failover Clustering is concerned, it looks, feels and smells like a Physical Disk and is used the same way Physical Disk Resource would be used.

Pre-requisites

  • You have used the Azure Portal before and are comfortable deploying virtual machines in Azure IaaS.
  • Have obtained a license or eval license of SIOS DataKeeper

Deploying a File Server Failover Cluster Instance using the Azure Portal

To build a 2-node File Server Failover Cluster Instance in Azure, we are going to assume you have a basic Virtual Network based on Azure Resource Manager and you have at least one virtual machine up and running and configured as a Domain Controller. Once you have a Virtual Network and a Domain configured, you are going to provision two new virtual machines which will act as the two nodes in our cluster.

Our environment will look like this:

DC1 – Our Domain Controller and File Share Witness
SQL1 and SQL2 – The two nodes of our File Server Cluster

Provisioning the two cluster nodes (SQL1 and SQL2)

Using the Azure Portal, we will provision both SQL1 and SQL2 exactly the same way. There are numerous options to choose from including instance size, storage options, etc. This guide is not meant to be an exhaustive guide to deploying Servers in Azure as there are some really good resources out there and more published every day. However, there are a few key things to keep in mind when creating your instances, especially in a clustered environment.

Availability Set – It is important that both SQL1, SQL2 AND DC1 reside in the same availability set. By putting them in the same Availability Set we are ensuring that each cluster node and the file share witness reside in a different Fault Domain and Update Domain. This helps guarantee that during both planned maintenance and unplanned maintenance the cluster will continue to be able to maintain quorum and avoid downtime.

3
Figure 3 – Be sure to add both cluster nodes and the file share witness to the same Availability Set

Static IP Address

Once each VM is provisioned, you will want to go into the setting and change the settings so that the IP address is Static. We do not want the IP address of our cluster nodes to change.

4
Figure 4 – Make sure each cluster node uses a static IP

Storage

As far as Storage is concerned, you will want to consult Performance best practices for SQL Server in Azure Virtual Machines. In any case, you will minimally need to add at least one additional disk to each of your cluster nodes. DataKeeper can use Basic Disk, Premium Storage or even Storage Pools consisting of multiple disks in a storage pool. Just be sure to add the same amount of storage to each cluster node and configure it identically. Also, be sure to use a different storage account for each virtual machine to ensure that a problem with one Storage Account does not impact both virtual machines at the same time.

5
Figure 5 – make sure to add additional storage to each cluster node

Create the Cluster

Assuming both cluster nodes (SQL1 and SQL2) have been provisioned as described above and added to your existing domain, we are ready to create the cluster. Before we create the cluster, there are a few Features that need to be enabled. These features are .Net Framework 3.5 and Failover Clustering. These features need to be enabled on both cluster nodes. You will also need to enable the FIle Server Role.

6
Figure 6 – enable both .Net Framework 3.5 and Failover Clustering features and the File Server on both cluster nodes

Once that role and those features have been enabled, you are ready to build your cluster. Most of the steps I’m about to show you can be performed both via PowerShell and the GUI. However, I’m going to recommend that for this very first step you use PowerShell to create your cluster. If you choose to use the Failover Cluster Manager GUI to create the cluster you will find that you wind up with the cluster being issues a duplicate IP address.

Without going into great detail, what you will find is that Azure VMs have to use DHCP. By specifying a “Static IP” when we create the VM in the Azure portal all we did was create sort of a DHCP reservation. It is not exactly a DHCP reservation because a true DHCP reservation would remove that IP address from the DHCP pool. Instead, this specifying a Static IP in the Azure portal simply means that if that IP address is still available when the VM requests it, Azure will issue that IP to it. However, if your VM is offline and another host comes online in that same subnet it very well could be issued that same IP address.

There is another strange side effect to the way Azure has implemented DHCP. When creating a cluster with the Windows Server Failover Cluster GUI when hosts use DHCP (which they have to), there is not option to specify a cluster IP address. Instead it relies on DHCP to obtain an address. The strange thing is, DHCP will issue a duplicate IP address, usually the same IP address as the host requesting a new IP address. The cluster will usually complete, but you may have some strange errors and you may need to run the Windows Server Failover Cluster GUI from a different node in order to get it to run. Once you get it to run you will want to change the cluster IP address to an address that is not currently in use on the network.

You can avoid that whole mess by simply creating the cluster via Powershell and specifying the cluster IP address as part of the PowerShell command to create the cluster.

You can create the cluster using the New-Cluster command as follows:

New-Cluster -Name cluster1 -Node sql1,sql2 -StaticAddress 10.0.0.101 -NoStorage

After the cluster creation completes, you will also want to run the cluster validation by running the following command:

Test-Cluster

7
Figure 7 – The output of the cluster creation and the cluster validation commands

Create File Share Witness

Because there is no shared storage, you will need to create a file share witness on another server in the same Availability Set as the two cluster nodes. By putting it in the same availability set you can be sure that you only lose one vote from your quorum at any given time. If you are unsure how to create a File Share Witness you can review this article http://www.howtonetworking.com/server/cluster12.htm. In my demo I put the file share witness on domain controller. I have published an exhaustive explanation of cluster quorums at https://blogs.msdn.microsoft.com/microsoft_press/2014/04/28/from-the-mvps-understanding-the-windows-server-failover-cluster-quorum-in-windows-server-2012-r2/

Install DataKeeper

After the cluster is created it is time to install DataKeeper. It is important to install DataKeeper after the initial cluster is created so the custom cluster resource type can be registered with the cluster. If you installed DataKeeper before the cluster is created you will simply need to run the install again and do a repair installation.

8
Figure 8 – Install DataKeeper after the cluster is created

During the installation you can take all of the default options.  The service account you use must be a domain account and be in the local administrators group on each node in the cluster.

9
Figure 9 – the service account must be a domain account that is in the Local Admins group on each node

Once DataKeeper is installed and licensed on each node you will need to reboot the servers.

Create the DataKeeper Volume Resource

To create the DataKeeper Volume Resource you will need to start the DataKeeper UI and connect to both of the servers.
10Connect to SQL1
11

Connect to SQL2
12

Once you are connected to each server, you are ready to create your DataKeeper Volume. Right click on Jobs and choose “Create Job”
13

Give the Job a name and description.
14

Choose your source server, IP and volume. The IP address is whether the replication traffic will travel.
15

Choose your target server.
16

Choose your options. For our purposes where the two VMs are in the same geographic region we will choose synchronous replication. For longer distance replication you will want to use asynchronous and enable some compression.
17

By clicking yes at the last pop-up you will register a new DataKeeper Volume Resource in Available Storage in Failover Clustering.
18

You will see the new DataKeeper Volume Resource in Available Storage.
19

Create the File Server Cluster Resource

To create the File Server Cluster Resource we will use Powershell once again rather than the Failover Cluster interface. The reason being is that once again because the virtual machines are configured to use DHCP, the GUI based wizard will not prompt us to enter a cluster IP address and instead will issue a duplicate IP address. To avoid this we will use a simple powershell command to create the FIle Server Cluster Resource and specify the IP Address

Add-ClusterFileServerRole -Storage "DataKeeper Volume E" -Name FS2 -StaticAddress 10.0.0.201

Make note of the IP address you specify here. It must be a unique IP address on your network. We will use this same IP address later when we create our Internal Load Balancer.

Create the Internal Load Balancer

Here is where failover clustering in Azure is different than traditional infrastructures. The Azure network stack does not support gratuitous ARPS, so clients cannot connect directly to the cluster IP address. Instead, clients connect to an internal load balancer and are redirected to the active cluster node. What we need to do is create an internal load balancer. This can all be done through the Azure Portal as shown below.

First, create a new Load Balancer
30

You can use an Public Load Balancer if your client connects over the public internet, but assuming your clients reside in the same vNet, we will create an Internal Load Balancer. The important thing to take note of here is that the Virtual Network is the same as the network where your cluster nodes reside. Also, the Private IP address that you specify will be EXACTLY the same as the address you used to create the SQL Cluster Resource.
31

After the Internal Load Balancer (ILB) is created, you will need to edit it. The first thing we will do is to add a backend pool. Through this process you will choose the Availability Set where your SQL Cluster VMs reside. However, when you choose the actual VMs to add to the Backend Pool, be sure you do not choose your file share witness. We do not want to redirect SQL traffic to your file share witness.
32
33

The next thing we will do is add a Probe. The probe we add will probe Port 59999. This probe determines which node is active in our cluster.
34

And then finally, we need a load balancing rule to redirect the SMB traffic, TCP port 445 The important thing to notice in the screenshot below is the Direct Server Return is Enabled. Make sure you make that change.

445_ilb

Fix the File Server IP Resource

The final step in the configuration is to run the following PowerShell script on one of your cluster nodes. This will allow the Cluster IP Address to respond to the ILB probes and ensure that there is no IP address conflict between the Cluster IP Address and the ILB. Please take note; you will need to edit this script to fit your environment. The subnet mask is set to 255.255.255.255, this is not a mistake, leave it as is. This creates a host specific route to avoid IP address conflicts with the ILB.

# Define variables
$ClusterNetworkName = “” 
# the cluster network name (Use Get-ClusterNetwork on Windows Server 2012 of higher to find the name)
$IPResourceName = “” 
# the IP Address resource name 
$ILBIP = “” 
# the IP Address of the Internal Load Balancer (ILB)
Import-Module FailoverClusters
# If you are using Windows Server 2012 or higher:
Get-ClusterResource $IPResourceName | Set-ClusterParameter -Multiple @{Address=$ILBIP;ProbePort=59999;SubnetMask="255.255.255.255";Network=$ClusterNetworkName;EnableDhcp=0}
# If you are using Windows Server 2008 R2 use this: 
#cluster res $IPResourceName /priv enabledhcp=0 address=$ILBIP probeport=59999  subnetmask=255.255.255.255

Creating File Shares

You will find that using the File Share Wizard in Failover Cluster Manager does not work. Instead, you will simply create the file shares in Windows Explorer on the active node. Failover clustering automatically picks up those shares and puts them in the cluster.

Note that the”Continuous Availability” option of a file share is not supported in this configuration.

Conclusion

You should now have a functioning File Server Failover Cluster. If you have ANY problems, please reach out to me on Twitter @daveberm and I will be glad to assist. If you need a DataKeeper evaluation key fill out the form at http://us.sios.com/clustersyourway/cta/14-day-trial and SIOS will send an evaluation key sent out to you.

Deploying a Highly Available File Server in Azure IaaS (ARM) with SIOS DataKeeper

SQL Server 2016 Support for Distributed Transactions with Always On Availability Groups

One of the most promising new features of SQL Server 2016 is the the support of distributed transaction with Always On Availability Groups. They did make some improvements in that regard, but it is not yet fully supported.

DTC

Example of a Distribute Transaction
Source – SQL Server 2016 DTC Support In Availability Groups

In SQL Server 2016, Distributed Transactions are only supported if the transaction is distributed across multiple instances of SQL Server. It is NOT supported if the transaction is distributed between different databases within the same instance of SQL Server. So in the picture above, if the databases are on separate SQL instances it will work, but not if the databases reside on the same instance which is more likely.

If you require distributed transaction support between different databases within the same SQL Server instance and you want high availability you still must use a traditional SQL Server Always On Failover Cluster or a SANLess Cluster using DataKeeper.

SQL Server 2016 Support for Distributed Transactions with Always On Availability Groups

Deploying Microsoft SQL Server 2014 Failover Clusters in #Azure Resource Manager (ARM)

In this post we will detail the specific steps required to deploy a 2-node SQL Server Failover Cluster in a single region of Azure using Azure Resource Manager. I will assume you are familiar with basic Azure concepts as well as basic SQL Server Failover Cluster concepts and will focus this article on what is unique about deploying a SQL Server Failover Cluster in Azure Resource Manager. If you are still using Azure Classic and need to deploy a SQL Server Failover Cluster in Classic you should read my article “STEP-BY-STEP: HOW TO CONFIGURE A SQL SERVER FAILOVER CLUSTER INSTANCE (FCI) IN MICROSOFT AZURE IAAS #SQLSERVER #AZURE #SANLESS

Before we begin, you should familiarize yourself with the Windows Azure Article, High availability and disaster recovery for SQL Server in Azure Virtual Machines. In that article all of the HA options are outlined, including AlwaysOn AG, Database Mirroring, Log Shipping, Backup and Restore and finally Failover Cluster Instances. Assuming you have dismissed those other options due to the costs associated with Enterprise Edition of SQL Server or lack of features, we are focusing on the final option – SQL Server AlwaysOn Failover Cluster Instance (FCI).

As you read that article it becomes clear that the lack of cluster aware shared storage in Azure is an obstacle in deploying SQL Server Failover clusters. However, there are a few alternatives described in that article. We will focus on using SIOS DataKeeper, to provide the storage to be used in the cluster.

1Figure 1 Microsoft’s support policy for SQL Server Failover Clusters
https://azure.microsoft.com/en-us/documentation/articles/virtual-machines-windows-classic-sql-dr/

With DataKeeper Cluster Edition you are able to take the locally attached storage, whether it is Premium or Standard Disks, and replicate those disks either synchronously, asynchronous or a mix or both, between two or more cluster nodes. In addition, a DataKeeper Volume resource is registered in Windows Server Failover Clustering which takes the place of a Physical Disk resource. Instead of controlling SCSI-3 reservations like a Physical Disk Resource, the DataKeeper Volume controls the mirror direction, ensuring the active node is always the source of the mirror. As far as SQL Server and Failover Clustering is concerned, it looks, feels and smells like a Physical Disk and is used the same way Physical Disk Resource would be used.

Pre-requisites

The Easy Way to do a Proof-of-Concept

If you are familiar with Azure Resource Manager you know one of the great new features is the ability to use Deployment Templates to rapidly deploy applications consisting of interrelated Azure resources. Many of these templates are developed by Microsoft and are readily available in their community on Github as “Quickstart Templates”. Community members are also free to extend templates or to publish their own templates on GitHub. One such template entitled “SQL Server 2014 AlwaysOn Failover Cluster Instance with SIOS DataKeeper Azure Deployment Template” published by SIOS Technology completely automates the process of deploying a 2-node SQL Server FCI into a new Active Directory Domain.

To deploy this template it is as easy as clicking on the “Deploy to Azure” button in the template.

2
Figure 2- Visit https://github.com/SIOSDataKeeper/SIOSDataKeeper-SQL-Cluster to rapidly provision a 2-node SQL cluster

Deploying a SQL Server Failover Cluster Instance using the Azure Portal

While the automated Azure deployment template is a quick and easy way to get a 2-node SQL Server FCI upon and running quickly, there are some limitations. For one, it uses a 180 Day evaluation version of SQL Server, so you can’t use it in production unless you upgrade the SQL eval licenses. Also, it builds an entirely new AD domain so if you want to integrate with your existing domain you are going to have to build it manually.

To build a 2-node SQL Server Failover Cluster Instance in Azure, we are going to assume you have a basic Virtual Network based on Azure Resource Manager (not Azure Classic) and you have at least one virtual machine up and running and configured as a Domain Controller. Once you have a Virtual Network and a Domain configured, you are going to provision two new virtual machines which will act as the two nodes in our cluster.

Our environment will look like this:

DC1 – Our Domain Controller and File Share Witness
SQL1 and SQL2 – The two nodes of our SQL Server Cluster

Provisioning the two cluster nodes (SQL1 and SQL2)

Using the Azure Portal, we will provision both SQL1 and SQL2 exactly the same way. There are numerous options to choose from including instance size, storage options, etc. This guide is not meant to be an exhaustive guide to deploying SQL Server in Azure as there are some really good resources out there and more published every day. However, there are a few key things to keep in mind when creating your instances, especially in a clustered environment.

Availability Set – It is important that both SQL1, SQL2 AND DC1 reside in the same availability set. By putting them in the same Availability Set we are ensuring that each cluster node and the file share witness reside in a different Fault Domain and Update Domain. This helps guarantee that during both planned maintenance and unplanned maintenance the cluster will continue to be able to maintain quorum and avoid downtime.

3
Figure 3 – Be sure to add both cluster nodes and the file share witness to the same Availability Set

Static IP Address

Once each VM is provisioned, you will want to go into the setting and change the settings so that the IP address is Static. We do not want the IP address of our cluster nodes to change.

4
Figure 4 – Make sure each cluster node uses a static IP

Storage

As far as Storage is concerned, you will want to consult Performance best practices for SQL Server in Azure Virtual Machines. In any case, you will minimally need to add at least one additional disk to each of your cluster nodes. DataKeeper can use Basic Disk, Premium Storage or even Storage Pools consisting of multiple disks in a storage pool. Just be sure to add the same amount of storage to each cluster node and configure it identically.

5
Figure 5 – make sure to add additional storage to each cluster node

Create the Cluster

Assuming both cluster nodes (SQL1 and SQL2) have been provisioned as described above and added to your existing domain, we are ready to create the cluster. Before we create the cluster, there are a few Features that need to be enabled. These features are .Net Framework 3.5 and Failover Clustering. These features need to be enabled on both cluster nodes.

6
Figure 6 – enable both .Net Framework 3.5 and Failover Clustering features on both cluster nodes

Once those features have been enabled, you are ready to build your cluster. Most of the steps I’m about to show you can be performed both via PowerShell and the GUI. However, I’m going to recommend that for this very first step you use PowerShell to create your cluster. If you choose to use the Failover Cluster Manager GUI to create the cluster you will find that you wind up with the cluster being issues a duplicate IP address.

Without going into great detail, what you will find is that Azure VMs have to use DHCP. By specifying a “Static IP” when we create the VM in the Azure portal all we did was create sort of a DHCP reservation. It is not exactly a DHCP reservation because a true DHCP reservation would remove that IP address from the DHCP pool. Instead, this specifying a Static IP in the Azure portal simply means that if that IP address is still available when the VM requests it, Azure will issue that IP to it. However, if your VM is offline and another host comes online in that same subnet it very well could be issued that same IP address.

There is another strange side effect to the way Azure has implemented DHCP. When creating a cluster with the Windows Server Failover Cluster GUI when hosts use DHCP (which they have to), there is not option to specify a cluster IP address. Instead it relies on DHCP to obtain an address. The strange thing is, DHCP will issue a duplicate IP address, usually the same IP address as the host requesting a new IP address. The cluster will usually complete, but you may have some strange errors and you may need to run the Windows Server Failover Cluster GUI from a different node in order to get it to run. Once you get it to run you will want to change the cluster IP address to an address that is not currently in use on the network.

You can avoid that whole mess by simply creating the cluster via Powershell and specifying the cluster IP address as part of the PowerShell command to create the cluster.

You can create the cluster using the New-Cluster command as follows:

New-Cluster -Name cluster1 -Node sql1,sql2 -StaticAddress 10.0.0.101 -NoStorage

After the cluster creation completes, you will also want to run the cluster validation by running the following command:

Test-Cluster

7
Figure 7 – The output of the cluster creation and the cluster validation commands

Create File Share Witness

Because there is no shared storage, you will need to create a file share witness on another server in the same Availability Set as the two cluster nodes. By putting it in the same availability set you can be sure that you only lose one vote from your quorum at any given time. If you are unsure how to create a File Share Witness you can review this article http://www.howtonetworking.com/server/cluster12.htm. In my demo I put the file share witness on domain controller. I have published an exhaustive explanation of cluster quorums at https://blogs.msdn.microsoft.com/microsoft_press/2014/04/28/from-the-mvps-understanding-the-windows-server-failover-cluster-quorum-in-windows-server-2012-r2/

Install DataKeeper

After the cluster is created it is time to install DataKeeper. It is important to install DataKeeper after the initial cluster is created so the custom cluster resource type can be registered with the cluster. If you installed DataKeeper before the cluster is created you will simply need to run the install again and do a repair installation.

8
Figure 8 – Install DataKeeper after the cluster is created

During the installation you can take all of the default options.  The service account you use must be a domain account and be in the local administrators group on each node in the cluster.

9
Figure 9 – the service account must be a domain account that is in the Local Admins group on each node

Once DataKeeper is installed and licensed on each node you will need to reboot the servers.

Create the DataKeeper Volume Resource

To create the DataKeeper Volume Resource you will need to start the DataKeeper UI and connect to both of the servers.
10Connect to SQL1
11

Connect to SQL2
12

Once you are connected to each server, you are ready to create your DataKeeper Volume. Right click on Jobs and choose “Create Job”
13

Give the Job a name and description.
14

Choose your source server, IP and volume. The IP address is whether the replication traffic will travel.
15

Choose your target server.
16

Choose your options. For our purposes where the two VMs are in the same geographic region we will choose synchronous replication. For longer distance replication you will want to use asynchronous and enable some compression.
17

By clicking yes at the last pop-up you will register a new DataKeeper Volume Resource in Available Storage in Failover Clustering.
18

You will see the new DataKeeper Volume Resource in Available Storage.
19

 

Install the first cluster node

You are now ready to install your first node. The cluster installation will proceed just like any other SQL cluster that you have ever built. I have not copied ever screen shot, just a few to guide you along the way.
20212223

You see that the DataKeeper Volume Resource is recognized as an available disk resource, just as if it were a shared disk.
24

Make note of the IP address you select here. It must be a unique IP address on your network. We will use this same IP address later when we create our Internal Load Balancer.
25

Add the second node

After the first node installs successfully, you will start the installation on the second node using the “Add node to a SQL Server failover cluster” option. Once again, the install is pretty straight forward, just use standard best practices as you would any other SQL cluster installation.
26272829

Create the Internal Load Balancer

Here is where failover clustering in Azure is different than traditional infrastructures. The Azure network stack does not support gratuitous ARPS, so clients cannot connect directly to the cluster IP address. Instead, clients connect to an internal load balancer and are redirected to the active cluster node. What we need to do is create an internal load balancer. This can all be done through the Azure Portal as shown below.

First, create a new Load Balancer
30

You can use an Public Load Balancer if your client connects over the public internet, but assuming your clients reside in the same vNet, we will create an Internal Load Balancer. The important thing to take note of here is that the Virtual Network is the same as the network where your cluster nodes reside. Also, the Private IP address that you specify will be EXACTLY the same as the address you used to create the SQL Cluster Resource.
31

After the Internal Load Balancer (ILB) is created, you will need to edit it. The first thing we will do is to add a backend pool. Through this process you will choose the Availability Set where your SQL Cluster VMs reside. However, when you choose the actual VMs to add to the Backend Pool, be sure you do not choose your file share witness. We do not want to redirect SQL traffic to your file share witness.
32
33

The next thing we will do is add a Probe. The probe we add will probe Port 59999. This probe determines which node is active in our cluster.
34

And then finally, we need a load balancing rule to redirect the SQL Server traffic. In our example we used a Default Instance of SQL which uses port 1433. You may also want to add rules for 1434 or others depending upon your applications requirements. The important thing to notice in the screen shot below is the Direct Server Return is Enabled. Make sure you make that change.
35

Fix the SQL Server IP Resource

The final step in the configuration is to run the following PowerShell script on one of your cluster nodes. This will allow the Cluster IP Address to respond to the ILB probes and ensure that there is no IP address conflict between the Cluster IP Address and the ILB. Please take note; you will need to edit this script to fit your environment. The subnet mask is set to 255.255.255.255, this is not a mistake, leave it as is. This creates a host specific route to avoid IP address conflicts with the ILB.

# Define variables
$ClusterNetworkName = “” 
# the cluster network name (Use Get-ClusterNetwork on Windows Server 2012 of higher to find the name)
$IPResourceName = “” 
# the IP Address resource name 
$ILBIP = “” 
# the IP Address of the Internal Load Balancer (ILB)
Import-Module FailoverClusters
# If you are using Windows Server 2012 or higher:
Get-ClusterResource $IPResourceName | Set-ClusterParameter -Multiple @{Address=$ILBIP;ProbePort=59999;SubnetMask="255.255.255.255";Network=$ClusterNetworkName;EnableDhcp=0}
# If you are using Windows Server 2008 R2 use this: 
#cluster res $IPResourceName /priv enabledhcp=0 address=$ILBIP probeport=59999  subnetmask=255.255.255.255

Conclusion

You should now have a functioning SQL Server Failover Cluster Instance. If you have ANY problems, please reach out to me on Twitter @daveberm and I will be glad to assist. If you need a DataKeeper evaluation key fill out the form at http://us.sios.com/clustersyourway/cta/14-day-trial and SIOS will send an evaluation key sent out to you.

Deploying Microsoft SQL Server 2014 Failover Clusters in #Azure Resource Manager (ARM)