Replicating a 2-node SQL Server 2012/2014 Standard Edition Cluster to a 3rd Server for Disaster Recovery

Many people have found themselves settling for SQL Server Standard Edition due to the cost of SQL Server Enterprise Edition. SQL Server Standard Edition has many of the same features, but has a few limitations. One limitation is that it does not support AlwaysOn Availability Groups. Also, it only supports two nodes in a cluster. With Database Mirroring being deprecated and only supporting synchronous replication in Standard Edition, you really have limited disaster recovery options.

One of those options is SIOS DataKeeper Cluster Edition. DataKeeper will work with your existing shared storage cluster and allow you to extend it to a 3rd node using either synchronous or asynchronous replication. If you are using SQL Server Enterprise you can simply add that 3rd node as another cluster member and you have a true multisite cluster. However, since we are talking about SQL Server Standard Edition you can’t add a 3rd node directly to the cluster. The good news is that DataKeeper will allow you to replicate data to a 3rd node so your data is protected.

Recovery in the event of a disaster simply means you are going to use DataKeeper to bring that 3rd node online as the source of the mirror and then use SQL Server Management Studio to mount the databases that are on the replicated volumes. You clients will also need to be redirected to this 3rd node, but it is a very cost effective solution with an excellent RPO and reasonable RTO.

The SIOS documentation talks about how to do this, but I have summarized the steps recently for one of my clients.

Configuration

  • Stop the SQL Resource
  • Remove the Physical Disk Resource From The SQL Cluster Resource
  • Remove the Physical Disk from Available Storage
  • Online Physical Disk on SECONDARY server, add the drive letter (if not there)
  • Run emcmd . setconfiguration <drive letter> 256
    and Reboot Secondary Server. This will cause the SECONDARY server to block access to the E drive which is important because you don’t want two servers having access to the E drive at the same time if you can avoid it.
  • Online the disk on PRIMARY server
  • Add the Drive letter if needed
  • Create a DataKeeper Mirror from Primary to DR
    You may have to wait a minute for the E drive to appear available in the DataKeeper Server Overview Report on all the servers before you can create the mirror properly. If done properly you will create a mirror from PRIMARY to DR and as part of that process DataKeeper will ask you about the SECONDARY server which shares the volume you are replicating.

In the event of a disaster….

On DR Node

  • Run EMCMD . switchovervolume <drive letter>
  • The first time make sure the SQL Service account has read/write access to all data and log files. You WILL have to explicitly grant this access the very first time you try to mount the databases.
  • Use SQL Management Studio to mount the databases
  • Redirect all clients to the server in the DR site, or better yet have the applications that reside in the DR site pre-configured to point to the SQL Server instance in the DR site.

After disaster is over

  • Power the servers (PRIMAY, SECONDARY) in the main site back on
  • Wait for mirror to reach mirroring state
  • Determine which node was previous source (run PowerShell as an administrator)
    get-clusterresource -Name “<DataKeeper Volume Resource name>” | get-clusterparameter
  • Make sure no DataKeeper Volume Resources are online in the cluster
  • Start the DataKeeper GUI on one cluster node. Resolve any split brain conditions (most likely there are none) ensuring the DR node is selected as the source during any split-brain recovery procedures
  • On the node that was reported as the previous source run EMCMD . switchovervolume <drive letter>
  • Bring SQL Server online in Failover Cluster Manager

The above steps assume you have SIOS DataKeeper Cluster Edition installed on all three servers (PRIMARY, SECONDARY, DR) and that PRIMARY and SECONDARY are a two node shared storage cluster and you are replicating data to DR which is just a standalone SQL Server instance (not part of the cluster) with just local attached storage. The DR Server will have a volume(s) that is the same size and drive letter as the shared cluster volume(s). This works rather well and will even let you replicate to a target that is in the cloud if you don’t have your own DR site configured.

You can also build the same configuration using all replicated storage if you want to eliminate the SAN completely.

Here is a nice short video that illustrates the some of the possible configurations. http://videos.us.sios.com/medias/aula05u2fl

Replicating a 2-node SQL Server 2012/2014 Standard Edition Cluster to a 3rd Server for Disaster Recovery

Why would you want to build a #SQLServer failover cluster instance in the #Azure cloud?

There was an interesting discussion happening today in the Twitterverse. Basically, someone asked the question “Has anyone set up a SQL Server AlwaysOn Failover Cluster Instance in Azure?” The ensuing conversation involved some well respect SQL Server experts which led to the following question, “Why would you want to build a SQL Server AlwaysOn Failover Cluster instance in the cloud?”

That question could be interpreted in two ways: “Why do you need High Availability in the Cloud” or “Why wouldn’t you use AlwaysOn Availability Groups instead of Failover Cluster Instances?”

Let’s address each question one at a time.

Question 1 – Why do you need High Availability in the Azure Cloud?

  • You might think that just because you host your SQL Server instance in Azure, that you are covered by their 99.95% uptime SLA. If you think that, you would be wrong. In order to take advantage of the 99.95% SLA you have to have at least two instances of SQL running in an Availability Set. With a single instance of SQL running you can definitely expect that there will minimally be downtime during maintenance periods, but you are also susceptible to unplanned failures.
  • Two instances of SQL Server cannot generally be load balanced, so you have to implement some sort of mechanism to keep the servers in sync and to ensure that if there is a problem with one of the servers, the other server will be able to continue to service the requests. High Availability solutions like AlwaysOn Availability Groups, AlwaysOn Failover Cluster Instances and even the deprecated Database Mirroring can provide high availability for SQL Server in that scenario. Other solutions like log shipping and transactional replication may be able to help keep data synchronized between servers, but they are not typically considered high availability solutions and will not ensure the availability of your SQL Server.
  • Microsoft does occasionally need to perform maintenance on Azure that could bring down an entire Upgrade Domain and all the instances running in that Upgrade Domain. You don’t have any say on when this will happen, so you need to have a mechanism in place to ensure that if they do have to bring down your primary SQL Server instance, you can expect that your secondary SQL Server instance will take over the workload without missing a beat. All of the high availability solutions mentioned above can ensure that you will continue to run in the event that Microsoft is doing maintenance on the Upgrade Domain of your primary server. Microsoft will only do maintenance on a single Upgrade Domain at a time, ensuring that your secondary server will still be online assuming you put the both in the same Availability Set.
  • What do you do if YOU want to performance maintenance on your production SQL Server? Maybe you want to install a Service Pack or other hotfix? Without a secondary server to fail over to, you will have to schedule planned downtime. One of the primary benefits of any high availability solution is the ability to do rolling upgrades, minimizing the impact of planned downtime.

Question 2 – Why wouldn’t you use AlwaysOn Availability Groups instead of Failover Cluster Instances?

  • Save Money! SQL Server AlwaysOn Availability Groups requires Enterprise Edition of SQL Server. Why not save money and deploy SQL Server Standard Edition and build a simple 2-node Failover Cluster Instance? Unless you need Enterprise Edition for some other reason, this is a no brainer.
  • Protect the ENTIRE SQL Server instance. AlwaysOn Availability Groups only protects user defined databases; you cannot protect the System and MSDB databases. If you build a Failover Cluster Instance instead, you are protecting the ENTIRE instance, including the System and MSDB databases.
  • Ease Administration. In Azure, you are limited to just on client listener. This limits you to just one Availability Group. In contrast, with a Failover Cluster Instance one client listener is all you need, so there is no limitation.
  • Worker Thread Exhaustion. With AlwaysOn AG you have to keep an eye on the available worker threads. The available worker threads limit the number of databases you can protect with AlwaysOn AG. In contrast, AlwaysOn Failover Clustering with DataKeeper block level replication does not consume more resources for each database you add, meaning you can scale to protect hundreds of databases without the additional overhead associated with AlwaysOn AG.
  • Distribute Transaction Support. AlwaysOn AG does not support distributed transactions (DTC), so if your application requires DTC support you are going to have to look at an AlwaysOn Failover Cluster Instance instead.
  • Support of Other Replication Technologies. If you plan on setting up Peer to Peer replication between two databases protected by AlwaysOn AG you can forget about it. In fact, there are many restrictions you have to be aware of once you deploy AlwaysOn Availability Groups. AlwaysOn FCI’s do not have any of those restrictions.

Knowing what you know above, shouldn’t the question really be “Why would I want to implement AlwaysOn AG in the Cloud when I can have a much more robust and inexpensive solution building an AlwaysOn Failover Cluster instance?”

If you are interested in building an AlwaysOn Failover Cluster Instance in Azure, check out my blog post Step-by-Step: How to configure a SQL Server Failover Cluster Instance (FCI) in Microsoft Azure IaaS #SQLServer #Azure #SANLess

You can also check out the only Azure Certified HA solution in the Azure Marketplace at http://azure.microsoft.com/en-us/marketplace/partners/sios-datakeeper/sios-datakeeper-8-bring-your-own-license/

Why would you want to build a #SQLServer failover cluster instance in the #Azure cloud?

Clustering 101: Configuring a Windows Cluster Quorum – What You Need To Know

In case you missed it, I held this in depth webinar on cluster quorums. In 30 minutes I go over everything you need to know about quorums, from node majority through Cloud Witness and everything in between. If you have additional questions about quorums post them as a comment on this article and I will be glad to help.

Clustering 101: Configuring a Windows Cluster Quorum – What You Need To Know

Learn Windows Server 2012 R2 Failover Clustering – Microsoft Virtual Academy

If you are new to clustering or just new to clustering in Windows Server 2012 R2 this is class for you. Symon Perriman (@SymonPerriman), 5nine Software Vice President of Business Development and Elden Christensen, Microsoft Principal Program Manager Lead, live and breathe failover clustering. You can’t ask for any better instructors. Stop what you are doing and watch this RIGHT NOW!

http://www.microsoftvirtualacademy.com/training-courses/failover-clustering-in-windows-server-2012-r2

Learn Windows Server 2012 R2 Failover Clustering – Microsoft Virtual Academy

What every SQL Server DBA needs to know about Windows Server 10 #sqlpass

The guys over at the High Availability & Disaster Recovery Virtual Chapter of @SQLPASS invited me to present on Windows Server 10. I discussed Cloud Witness, Storage Replica and Rolling Cluster OS Upgrades. In case you missed it you can view the recording here.

https://attendee.gotowebinar.com/recording/8228215421492642561

What every SQL Server DBA needs to know about Windows Server 10 #sqlpass

#Azure Storage Service Interruption…Time for “Plan B”

Yesterday evening Pacific Standard Time, Azure storage services experienced a service interruption across the United States, Europe and parts of Asia, which impacted multiple cloud services in these regions.

As part of a performance update to Azure Storage, an issue was discovered that resulted in reduced capacity across services utilizing Azure Storage, including Virtual Machines, Visual Studio Online, Websites, Search and other Microsoft services.

Read the whole report on the Azure blog. http://azure.microsoft.com/blog/2014/11/19/update-on-azure-storage-service-interruption/

So what does this outage mean to those thinking about a cloud deployment? Global “interruptions” of this magnitude certainly cannot occur on any regular basis for any cloud provider that intends to remain in the cloud business, whether they are Microsoft, Amazon, Google or other. However, as a cloud architect or person responsible for a cloud deployment, you have a responsibility to your customer to have a “Plan B” in your back pocket in case the worst case scenario actually happens.

What exactly is a “Plan B”? Plan B involves having a documented procedure for recovering data and services in an alternate location in the event of a wide spread outage that impacts a cloud provider’s ability to deliver their service, despite deploying what you thought was a highly resilient cloud deployment designed to keep running even in the event of localized outages within a region, availability zone or fault domain.

At a high level you should be concerned about three things: Data Recovery, Application Recovery, and Client Access. There are many ways to address these concerns, some more automated than others and some with a better Recovery Time Objective (RTO) and Recovery Point Objective (RPO) than others.

It was just last week that I blogged about how to create a multisite cluster that stretched between the AWS cloud and the Azure cloud. This type of configuration is just what is needed in the event of an outage of the magnitude that we just experienced yesterday in the Azure cloud. https://clusteringformeremortals.com/2014/11/18/cloud-resiliency-for-sqlserver-failover-clusters-aws-to-azure-multisite-cluster/

Figure 1 – Example of a Cloud-to-Cloud Multisite Cluster Configuration

Another alternative to the “cloud-to-cloud” replication model is of course utilizing your own datacenter as a disaster recovery site for your cloud deployment. The advantages of this is that you have physical ownership of your data, but of course now you are back in the business of managing a datacenter, which can negate some of the benefit of a pure cloud deployment.

Figure 2 – Hybrid Cloud Deployment Model

If you are not ready to go full on cloud, you can still make use of the cloud as a disaster recovery site. This is probably the easiest and most cost effective way to implement an offsite datacenter for disaster recovery and to start taking advantage of what the cloud has to offer without fully committing to moving all your workloads into the cloud.

Figure 3 – Using the Cloud as a Disaster Recovery Site

The illustrations shown above make use of the host based replication solution called DataKeeper Cluster Edition to build multisite SQL Server clusters. However, DataKeeper can be used to keep any data in sync, either between different cloud providers or in the hybrid cloud model.

Microsoft is not alone in dealing with cloud outages as outages have impacted Google, Microsoft, Amazon, DropBox and many others just this year alone. Having a “Plan B” in place is a must have anytime you are relying on any cloud service.

#Azure Storage Service Interruption…Time for “Plan B”