High Availability Options for Microsoft SQL Server in the Google Cloud

I was recently interviewed by VMblog about high availability options for SQL Server. You can check out the interview here http://vmblog.com/

For the step by step guide I previously published, check it out here https://clusteringformeremortals.com/2018/01/10/how-to-build-a-sanless-sql-server-failover-cluster-instance-in-google-cloud-platform/

High Availability Options for Microsoft SQL Server in the Google Cloud

STORAGE SPACES DIRECT (S2D) FOR SQL SERVER FAILOVER CLUSTER INSTANCES (FCI)?

With the introduction of Windows Server 2016 Datacenter Edition a new feature called Storage Spaces Direct (S2D) was introduced. At a very high level, this solution allows you to pool together locally attached storage and present it to the cluster as a CSV for use in a Scale Out File Server, which can then be accessed over SMB 3 and used to hold cluster data such as Hyper-V VMDK files. This can also be configured in a hyper-converged (HCI) fashion such that the application and data can all run on the same set of servers.  This is a grossly over-simplified description, but for details, you will want to look here.

 

Storage Spaces Direct StackImage taken from https://docs.microsoft.com/en-us/windows-server/storage/storage-spaces/storage-spaces-direct-overview

The main use case targeted is hyper-converged infrastructure for Hyper-V deployments. However, there are other use cases, including leveraging this SMB storage to store SQL Server Data to be used in a SQL Server Failover Cluster Instance

Why would anyone want to do that? Well, for starters you can now build a highly available 2-node SQL Server Failover Cluster Instance (FCI) with SQL Server Standard Edition, without the need for shared storage. Previously, if you wanted HA without a SAN you pretty much were driven to buy SQL Server Enterprise Edition and make use of Always On Availability Groups or purchase SIOS DataKeeper and leverage the 3rd party solution which lets you build SANless clusters with any version of Windows or SQL Server. SQL Server Enterprise Edition can really drive up the cost of your project, especially if you were only buying it for the Availability Groups feature.

In addition to the cost associated with Availability Groups, there are a number of other technical reasons why you might prefer a Failover Cluster over an AG. Application compatibility, instance vs. database level protection, large number of databases, DTC support, trained staff, etc., are just some of the technical reasons why you may want to stick with a Failover Cluster Instance.

Microsoft lists both the SIOS DataKeeper solution and the S2D solution as two of the supported solutions for SQL Server FCI in their documentation here.

s2d

https://docs.microsoft.com/en-us/azure/virtual-machines/windows/sql/virtual-machines-windows-sql-high-availability-dr

When comparing the two solutions, you have to take into account that SIOS has been allowing you to build SANless Clusters since 1999, while the S2D solution is still in its infancy.  Having said that, there are bound to be some areas where S2D has some catching up to do, or simply features that they will never support simply due to the limitations with the technology.

Have a look at the following table for an overview of some of the things you should consider before you choose your SANless cluster solution.

S2D_vs_DKCE

If we go through this chart, we see that SIOS DataKeeper clearly has some significant advantages. For one, DataKeeper supports a much wider range of platforms, going all the way back to Windows Server 2008 R2 and SQL Server 2008 R2. The S2D solution only supports the latest releases of Windows and SQL Server 2016/2017. S2D also requires the  Datacenter Edition of Windows, which can add significantly to the cost of your deployment. In addition, SIOS delivers the ONLY HA/DR solution for SQL Server on Linux that works both on-prem and in the cloud.

But beyond the cost and platform limitations, I think the most glaring gap comes when we start to consider disaster recovery options for your SANless cluster. Allan Hirt, SQL Server Cluster guru and fellow Microsoft Cloud and Datacenter Management MVP, recently posted about this S2D limitation. In his article Revisiting Storage Spaces Direct and SQL Server FCIs  Allan points out that due to the lack of support for stretching S2D clusters across sites or including an S2D based cluster as a leg in an Always On Availability Group, the best option for DR in the S2D scenario is log shipping!

Don’t get me wrong, log shipping has been around forever and will probably be around long after I’m gone, but that is taking a HUGE step backwards when we think about all the disaster recovery solutions we have become accustomed to, like multi-site clusters, Availability Groups, etc.

In contrast, the SIOS DataKeeper solution fully supports Always On Availability Groups, and better yet – it can allow you to stretch your FCI across sites to give you the best HA/DR solution you could hope to achieve in terms of RTO/RPO. In an Azure environment, DataKeeper also support Azure Site Recovery (ASR), giving you even more options for disaster recovery.

The rest of this chart is pretty self explanatory. It basically consist of a list hardware, storage and networking requirements that must be met before you can deploy an S2D cluster. An exhaustive list of S2D requirements is maintained here.  https://docs.microsoft.com/en-us/windows-server/storage/storage-spaces/storage-spaces-direct-hardware-requirements

The SIOS DataKeeper solution is much more lenient. It supports any locally attached storage and as long as the hardware passes cluster validation, it is a supported cluster configuration. The block level replication solution has been working great ever since 1 Gbps was considered a fast LAN and a T1 WAN connection was considered a luxury.

SANless clustering is particularly interesting for cloud deployments. The cloud does not offer traditional shared storage options for clusters. So for users in the middle of a “lift and shift” to the cloud that want to take their clusters with them they must look at alternate storage solutions. For cloud deployments, SIOS is certified for AzureAWS and Google and available in the relevant cloud marketplace. While there doesn’t appear to be anything blocking deployment of S2D based clusters in Azure or Google, there is a conspicuous lack of documentation or supportability statements from Microsoft for those platforms.

SIOS DataKeeper has been doing this since 1999. SIOS has heard all the feature requests, uncovered all the bugs, and has a rock solid solution for SANless clusters that is time tested and proven. While Microsoft S2D is a promising technology, as a 1st generation product I would wait until the dust settles and some of the feature gap closes before I would consider it for my business critical applications.

STORAGE SPACES DIRECT (S2D) FOR SQL SERVER FAILOVER CLUSTER INSTANCES (FCI)?

How to Build a SANless SQL Server Failover Cluster Instance in Google Cloud Platform

If you are going to host SQL Server on the Google Cloud Platform (GCP) you will want to make sure it is highly available. One of the best and most economical ways to do that is to build a SQL Server Failover Cluster Instance (FCI). Since SQL Server Standard Edition supports Failover Clustering, we can avoid the cost associated with SQL Server Enterprise Edition which is required for Always On Availability Groups. In addition, SQL Server Failover Clustering is a much more robust solution as it protects the entire instance of SQL Server, has no limitations in terms of DTC (Distributed Transaction Coordinator) support and is easier to manage. Plus, it supports earlier versions of SQL Server that you may still have, such as SQL 2012 through the latest SQL 2017. Unfortunately, SQL 2008 R2 is not supported due to the lack of support for cross-subnet failover.

Traditionally, SQL Server FCI requires that you have a SAN or some type of shared storage device. In the cloud, there is no cluster-aware shared storage. In place of a SAN, we will build a SANless cluster using SIOS DataKeeper Cluster Edition (DKCE). DKCE uses block-level replication to ensure that the locally attached storage on each instance remains in sync with one other. It also integrates with Windows Server Failover Clustering through its own storage class resource called a DataKeeper Volume which takes the place of the physical disk resource. As far as the cluster is concerned the SIOS DataKeeper volume looks like a physical disk, but instead of controlling SCSI reservations, it controls the mirror direction, ensuring that only the active server writes to the disk and that the passive server(s) receive all the changes either synchronously or asynchronously.

In this guide, we will walk through the steps to build a two-node failover cluster between two instances in the same region, but in different Zones, within the GCP as shown in Figure 1.

Google Cloud Diagram

Download the entire white paper at https://us.sios.com/san-sanless-clusters-resources/white-paper-build-sql-server-failover-cluster-gcp/

How to Build a SANless SQL Server Failover Cluster Instance in Google Cloud Platform

MS SQL Server v.Next on Linux with Replication and High Availability #Azure #Cloud #Linux

With Microsoft’s recent release of the first public preview of MS SQL Server running on Linux, I wondered what they would do for high availability. Knowing how tightly coupled AlwaysOn Availability Groups and Failover Clustering is to the Windows operating system I was pretty certain they would not be options and I was correct.

Well, the people over at LinuxClustering.Net answered my question on how to provide high availability failover clusters for MS SQL Server v.Next on Linux with this great Step by Step article.

http://www.linuxclustering.net/2016/11/18/step-by-step-sql-server-v-next-for-linux-public-preview-high-availability-azure/

Not only that, they did it all in Azure which we know can be tricky given some of the network limitations.

sql-dependencies-created

I’d be curious to know if you are excited about SQL Server on Linux or if you think it is just a little science experiment. If you are excited, what does SQL Server on Linux bring to the table that open source databases don’t? If you like SQL Server that much why not just run it on Windows?

I’m not being facetious here, I honestly want to know what excites you about SQL Server on Linux. I’m looking forward to your comments.

MS SQL Server v.Next on Linux with Replication and High Availability #Azure #Cloud #Linux

SQL Server 2016 Support for Distributed Transactions with Always On Availability Groups

One of the most promising new features of SQL Server 2016 is the the support of distributed transaction with Always On Availability Groups. They did make some improvements in that regard, but it is not yet fully supported.

DTC

Example of a Distribute Transaction
Source – SQL Server 2016 DTC Support In Availability Groups

In SQL Server 2016, Distributed Transactions are only supported if the transaction is distributed across multiple instances of SQL Server. It is NOT supported if the transaction is distributed between different databases within the same instance of SQL Server. So in the picture above, if the databases are on separate SQL instances it will work, but not if the databases reside on the same instance which is more likely.

If you require distributed transaction support between different databases within the same SQL Server instance and you want high availability you still must use a traditional SQL Server Always On Failover Cluster or a SANLess Cluster using DataKeeper.

SQL Server 2016 Support for Distributed Transactions with Always On Availability Groups

Replicating a 2-node SQL Server 2012/2014 Standard Edition Cluster to a 3rd Server for Disaster Recovery

Many people have found themselves settling for SQL Server Standard Edition due to the cost of SQL Server Enterprise Edition. SQL Server Standard Edition has many of the same features, but has a few limitations. One limitation is that it does not support AlwaysOn Availability Groups. Also, it only supports two nodes in a cluster. With Database Mirroring being deprecated and only supporting synchronous replication in Standard Edition, you really have limited disaster recovery options.

One of those options is SIOS DataKeeper Cluster Edition. DataKeeper will work with your existing shared storage cluster and allow you to extend it to a 3rd node using either synchronous or asynchronous replication. If you are using SQL Server Enterprise you can simply add that 3rd node as another cluster member and you have a true multisite cluster. However, since we are talking about SQL Server Standard Edition you can’t add a 3rd node directly to the cluster. The good news is that DataKeeper will allow you to replicate data to a 3rd node so your data is protected.

Recovery in the event of a disaster simply means you are going to use DataKeeper to bring that 3rd node online as the source of the mirror and then use SQL Server Management Studio to mount the databases that are on the replicated volumes. You clients will also need to be redirected to this 3rd node, but it is a very cost effective solution with an excellent RPO and reasonable RTO.

The SIOS documentation talks about how to do this, but I have summarized the steps recently for one of my clients.

Configuration

  • Stop the SQL Resource
  • Remove the Physical Disk Resource From The SQL Cluster Resource
  • Remove the Physical Disk from Available Storage
  • Online Physical Disk on SECONDARY server, add the drive letter (if not there)
  • Run emcmd . setconfiguration <drive letter> 256
    and Reboot Secondary Server. This will cause the SECONDARY server to block access to the E drive which is important because you don’t want two servers having access to the E drive at the same time if you can avoid it.
  • Online the disk on PRIMARY server
  • Add the Drive letter if needed
  • Create a DataKeeper Mirror from Primary to DR
    You may have to wait a minute for the E drive to appear available in the DataKeeper Server Overview Report on all the servers before you can create the mirror properly. If done properly you will create a mirror from PRIMARY to DR and as part of that process DataKeeper will ask you about the SECONDARY server which shares the volume you are replicating.

In the event of a disaster….

On DR Node

  • Run EMCMD . switchovervolume <drive letter>
  • The first time make sure the SQL Service account has read/write access to all data and log files. You WILL have to explicitly grant this access the very first time you try to mount the databases.
  • Use SQL Management Studio to mount the databases
  • Redirect all clients to the server in the DR site, or better yet have the applications that reside in the DR site pre-configured to point to the SQL Server instance in the DR site.

After disaster is over

  • Power the servers (PRIMAY, SECONDARY) in the main site back on
  • Wait for mirror to reach mirroring state
  • Determine which node was previous source (run PowerShell as an administrator)
    get-clusterresource -Name “<DataKeeper Volume Resource name>” | get-clusterparameter
  • Make sure no DataKeeper Volume Resources are online in the cluster
  • Start the DataKeeper GUI on one cluster node. Resolve any split brain conditions (most likely there are none) ensuring the DR node is selected as the source during any split-brain recovery procedures
  • On the node that was reported as the previous source run EMCMD . switchovervolume <drive letter>
  • Bring SQL Server online in Failover Cluster Manager

The above steps assume you have SIOS DataKeeper Cluster Edition installed on all three servers (PRIMARY, SECONDARY, DR) and that PRIMARY and SECONDARY are a two node shared storage cluster and you are replicating data to DR which is just a standalone SQL Server instance (not part of the cluster) with just local attached storage. The DR Server will have a volume(s) that is the same size and drive letter as the shared cluster volume(s). This works rather well and will even let you replicate to a target that is in the cloud if you don’t have your own DR site configured.

You can also build the same configuration using all replicated storage if you want to eliminate the SAN completely.

Here is a nice short video that illustrates the some of the possible configurations. http://videos.us.sios.com/medias/aula05u2fl

Replicating a 2-node SQL Server 2012/2014 Standard Edition Cluster to a 3rd Server for Disaster Recovery