#Azure Storage Service Interruption…Time for “Plan B”

Yesterday evening Pacific Standard Time, Azure storage services experienced a service interruption across the United States, Europe and parts of Asia, which impacted multiple cloud services in these regions.

As part of a performance update to Azure Storage, an issue was discovered that resulted in reduced capacity across services utilizing Azure Storage, including Virtual Machines, Visual Studio Online, Websites, Search and other Microsoft services.

Read the whole report on the Azure blog. http://azure.microsoft.com/blog/2014/11/19/update-on-azure-storage-service-interruption/

So what does this outage mean to those thinking about a cloud deployment? Global “interruptions” of this magnitude certainly cannot occur on any regular basis for any cloud provider that intends to remain in the cloud business, whether they are Microsoft, Amazon, Google or other. However, as a cloud architect or person responsible for a cloud deployment, you have a responsibility to your customer to have a “Plan B” in your back pocket in case the worst case scenario actually happens.

What exactly is a “Plan B”? Plan B involves having a documented procedure for recovering data and services in an alternate location in the event of a wide spread outage that impacts a cloud provider’s ability to deliver their service, despite deploying what you thought was a highly resilient cloud deployment designed to keep running even in the event of localized outages within a region, availability zone or fault domain.

At a high level you should be concerned about three things: Data Recovery, Application Recovery, and Client Access. There are many ways to address these concerns, some more automated than others and some with a better Recovery Time Objective (RTO) and Recovery Point Objective (RPO) than others.

It was just last week that I blogged about how to create a multisite cluster that stretched between the AWS cloud and the Azure cloud. This type of configuration is just what is needed in the event of an outage of the magnitude that we just experienced yesterday in the Azure cloud. http://clusteringformeremortals.com/2014/11/18/cloud-resiliency-for-sqlserver-failover-clusters-aws-to-azure-multisite-cluster/

Figure 1 – Example of a Cloud-to-Cloud Multisite Cluster Configuration

Another alternative to the “cloud-to-cloud” replication model is of course utilizing your own datacenter as a disaster recovery site for your cloud deployment. The advantages of this is that you have physical ownership of your data, but of course now you are back in the business of managing a datacenter, which can negate some of the benefit of a pure cloud deployment.

Figure 2 – Hybrid Cloud Deployment Model

If you are not ready to go full on cloud, you can still make use of the cloud as a disaster recovery site. This is probably the easiest and most cost effective way to implement an offsite datacenter for disaster recovery and to start taking advantage of what the cloud has to offer without fully committing to moving all your workloads into the cloud.

Figure 3 – Using the Cloud as a Disaster Recovery Site

The illustrations shown above make use of the host based replication solution called DataKeeper Cluster Edition to build multisite SQL Server clusters. However, DataKeeper can be used to keep any data in sync, either between different cloud providers or in the hybrid cloud model.

Microsoft is not alone in dealing with cloud outages as outages have impacted Google, Microsoft, Amazon, DropBox and many others just this year alone. Having a “Plan B” in place is a must have anytime you are relying on any cloud service.

New IP Address Options for Azure IaaS VMs #Azure

The Windows Azure team has been very busy recently adding a bunch of new features to Azure IaaS. Here are just some of the features you should check out.

Static IP for Azure instances
Until recently Azure VMs got their IP addresses from DHCP exclusively. There were some tricks to make your VMs get the same IP address most of the time, but in reality you really couldn’t guarantee the VMs would always get the same IP address as DHCP reservations are not supported. With this new feature you can not only assign a static private IP address to each VM, you can also assign a static public IP address to each VM. Previously public IP addresses were only used to address Cloud Services.

Multiple NIC Cards – Multiple NIC support is currently only available in the Northern Europe region, but will be rolled out worldwide soon according to Microsoft. Multiple NIC support will allow you to manage network traffic better. Personally I will be using multiple NICs in my failover cluster configuration for network redundancy and to keep my DataKeeper replication traffic separate from my client access traffic.

Internal Load Balancer – As of Oct 8th you can now provision a single Internal Load Balancer (ILB) per Cloud Service. This is a HUGE improvement as you are now able to configure multi-tier applications that reside within the same Cloud Service and you no longer have to rely on External Load Balancers which send your traffic across the public network. The best new use case for this though is that this is now the recommend best practice for client access points when building failover clusters in Windows Azure. Check out this great new blog on the Azure Blog that talks about building failover cluster instances on Azure with SIOS DataKeeper Cluster Edition.

Check back soon for a Step-by-Step article on configuring a SQL Server 2014 Failover Cluster instance in Azure IaaS using DataKeeper Cluster Edition and all of these great new features.

Windows Server 10 New Cloud Witness

My favorite new cluster feature in Windows Server 10 is the Cloud Witness. The Cloud Witness is another option in addition to the traditional disk witness and file share witness which are used when configuring the quorum in a Windows Server Failover Cluster. For a complete history of cluster quorums and their options please read my article on the Microsoft Press blog…….

So what exactly is a Cloud Witness? A Cloud Witness utilizes a Windows Azure IaaS Storage Account to act as a vote in your cluster quorum. It can be used instead of a disk witness or a fail share witness. The cluster nodes simply need public internet access to reach an Azure storage account that you have provisioned as part of your Azure subscription.

So why would I use a disk witness? In most shared storage clusters you will still use a node and disk witness majority quorum. However, when you are doing #SANLess clusters, or multisite clusters, you now have another option to consider instead of a file share witness. Let’s look at some scenarios where a Cloud Witness would make more sense than a File Share Witness.

Scenario 1 – Multisite Cluster

If you have done your research on multisite clusters, you will have discovered that if you want automatic failover in the event of a complete site loss, the only safe way to do this is to have an even number of cluster votes in each site and to configure a File Share Witness in a 3rd site. In addition, the network connection between your primary site and your DR site must be completely independent of the network connection you have between this 3rd site and your primary and DR sites.

The cost associated with maintaining a completely independent network and having access to a 3rd data center for hosting a file share witness is not always possible. This is where having a Cloud Witness in Windows Azure comes in handy. Assuming you have an equal number of cluster votes in each data center and each data center also has access to the internet, you can define a Windows Azure Storage account as a Cloud Witness instead of a File Share Witness. Using a Cloud Share Witness eliminates the cost associated with maintaining a 3rd data center. There will be a slight monthly fee for the Azure Cloud service, but this will be minimal in comparison to the cost associated with maintaining a File Share Witness.

Scenario 2 – #SANLess Hyper-V Cluster at Remote Office/Branch Office (ROBO)

Here is the scenario. You run a fast food chain, department store chain, drug store chain, etc. You have the need to run a handful of servers to support your local operations at each of your store fronts. You decide that running these servers as virtual machines in Hyper-V are the way you want to go. Having these servers highly available is very important, so you decide it would be best to implement a two node cluster at each location. To minimize costs and to make management easy, you decide to purchase an identical pair of servers for each location and use the locally attached storage to build a #SANLess cluster with DataKeeper Cluster Edition. You come to realize that because you went #SANLess you don’t have access to a disk witness. And also, because you didn’t plan on purchasing a 3rd server for each location, a file share witness is also out of the question. You are in a real conundrum…a 2 node cluster NEEDS A WITNESS!

Here is where the Cloud Witness in Windows Azure comes and saves the day. Assuming your servers have access to the internet, a simple Cloud Witness can be configured and now you can support a 2-node #SANLess Hyper-V Cluster in each location. I would configure a non-clustered DC VM on each physical server and then create as many highly available VMs as a need in the cluster just using local attached storage.

Cloud Witness is a great new option in Windows Server 10. The only thing that would make it better is if they back ported it to Windows Server 2012 R2 so I could use it today!

 

UPDATE 11/5/2014 – When you create your Storage Account in Azure, make sure you choose “Locally Redundant” as Geo-Redundant Storage is not supported for the Cloud Witness.

Windows Server 10 “Cloud Witness” in a failover cluster

My favorite new cluster feature in Windows Server 10 is the Cloud Witness. The Cloud Witness is another option in addition to the traditional disk witness and file share witness which are used when configuring the quorum in a Windows Server Failover Cluster. For a complete history of cluster quorums and their options please read my article on the Microsoft Press blog http://blogs.msdn.com/b/microsoft_press/archive/2014/04/28/from-the-mvps-understanding-the-windows-server-failover-cluster-quorum-in-windows-server-2012-r2.aspx

So what exactly is a Cloud Witness? A Cloud Witness utilizes a Windows Azure IaaS Storage Account to act as a vote in your cluster quorum. It can be used instead of a disk witness or a fail share witness. The cluster nodes simply need public internet access to reach an Azure storage account that you have provisioned as part of your Azure subscription.

So why would I use a disk witness? In most shared storage clusters you will still use a node and disk witness majority quorum. However, when you are doing #SANLess clusters, or multisite clusters, you now have another option to consider instead of a file share witness. Let’s look at some scenarios where a Cloud Witness would make more sense than a File Share Witness.

Scenario 1 – Multisite Cluster

If you have done your research on multisite clusters, you will have discovered that if you want automatic failover in the event of a complete site loss, the only safe way to do this is to have an even number of cluster votes in each site and to configure a File Share Witness in a 3rd site. In addition, the network connection between your primary site and your DR site must be completely independent of the network connection you have between this 3rd site and your primary and DR sites.

The cost associated with maintaining a completely independent network and having access to a 3rd data center for hosting a file share witness is not always possible. This is where having a Cloud Witness in Windows Azure comes in handy. Assuming you have an equal number of cluster votes in each data center and each data center also has access to the internet, you can define a Windows Azure Storage account as a Cloud Witness instead of a File Share Witness. Using a Cloud Share Witness eliminates the cost associated with maintaining a 3rd data center. There will be a slight monthly fee for the Azure Cloud service, but this will be minimal in comparison to the cost associated with maintaining a File Share Witness.

Scenario 2 – #SANLess Hyper-V Cluster at Remote Office/Branch Office (ROBO)

Here is the scenario. You run a fast food chain, department store chain, drug store chain, etc. You have the need to run a handful of servers to support your local operations at each of your store fronts. You decide that running these servers as virtual machines in Hyper-V are the way you want to go. Having these servers highly available is very important, so you decide it would be best to implement a two node cluster at each location. To minimize costs and to make management easy, you decide to purchase an identical pair of servers for each location and use the locally attached storage to build a #SANLess cluster with DataKeeper Cluster Edition. You come to realize that because you went #SANLess you don’t have access to a disk witness. And also, because you didn’t plan on purchasing a 3rd server for each location, a file share witness is also out of the question. You are in a real conundrum…a 2 node cluster NEEDS A WITNESS!

Here is where the Cloud Witness in Windows Azure comes and saves the day. Assuming your servers have access to the internet, a simple Cloud Witness can be configured and now you can support a 2-node #SANLess Hyper-V Cluster in each location. I would configure a non-clustered DC VM on each physical server and then create as many highly available VMs as a need in the cluster just using local attached storage.

Cloud Witness is a great new option in Windows Server 10. The only thing that would make it better is if they back ported it to Windows Server 2012 R2 so I could use it today!

Configuring a #SANLess Hyper-V Failover Cluster with DataKeeper Cluster Edition

Q. What is a SANLess cluster?
A. It is a cluster that uses local storage instead of a SAN.

Q. Why would I want a SANLess cluster?
A. There are a few reasons:

  • Eliminate the cost of a SAN
  • Eliminate the SAN as a single point of failure
  • Take advantage of high speed storage options such a Fusion-io ioDrives and other high speed storage devices that plug in locally
  • Stretch the cluster across geographic locations for disaster recovery
  • Simplify management
  • Eliminate the need for a SAN administrator

Building a SANLess cluster with DataKeeper Cluster Edition is easy. If you know anything about Windows Server Failover Clustering than you already know 99% of the solution. Even if you have never built a Windows Server Failover Cluster before, don’t worry; Microsoft has made it easy and painless. For the beginners, I have written a step-by-step article that tells you how to build a Windows Server 2012 #SANLess cluster in my blog post here: http://clusteringformeremortals.com/2012/12/31/windows-server-2012-clustering-step-by-step/

If you have followed the steps in my post, you will be at the point where you are ready to create your first highly available virtual machine. There are two options for making a highly available virtual machine. The first option assumes that you have an existing virtual machine that you want to make highly available, and the second option assumes you are building a highly available virtual machine from scratch.

Configuring the DataKeeper Volume Cluster Resource

Because a SANLess Hyper-V cluster requires one VM per volume, you will want to make sure you have your storage partitioned so that you have enough volumes for each VM. The storage on each cluster node should be configured identically in terms of drive letters and partition sizes. Once you have the partitions configured properly and your VM resides on the partition you want to replicate, open the DataKeeper interface and walk through the three step wizard to create the DataKeeper Volume Resources as shown in below.

First, open the DataKeeper interface and click on Connect to Server. Do this twice to connect to both servers.

Once you are connected, click on Create Job to create a mirror of the volume that contains the virtual machine you want to make highly available as shown below. In this example we will mirror the E drive.

Whenever possible, keep replication traffic on a private network. In this case, we are using the 10.0.0.0/8 network for replication traffic. This can be a simple patch cable that connects the two servers across two unused NICs.

The final screen shows the options available for mirroring. For local area networks, Synchronous mirroring is preferred. When replicating across wide area networks, you will want to use Asynchronous replication and possibly enable compression. I would not limit the Maximum bandwidth as that could potentially cause your mirror to go out of sync if your rate of change (Disk Right Bytes/sec) exceeds the Maximum bandwidth specified. However, you may want to temporarily enable Maximum bandwidth during the initial mirror creation process, otherwise DataKeeper may flood the network with the initial replication traffic as it tries to get in sync as quickly as possible. Both Maximum bandwidth and Compression settings can be adjusted after the mirror is created. However, you cannot change between Synchronous and Asynchronous mirroring once the mirror has been created without deleting the mirror and recreating it.

At the end of the mirror creation process you will see a popup asking if you want to auto-register this volume as a cluster volume. Select Yes, this will create a DataKeeper volume resource in Failover Clustering Available Storage.

You are now ready to create your highly available VMs.

Option 1 – Clustering an Existing VM

Once again, this procedure assumes you have an existing VM that you want to make highly available. If you do not have an existing VM, you will want to follow the procedure in Option 2 – Creating a Highly Available VM. Otherwise, you should have a VM when looking at Hyper-V Manager as shown below.

All the VM files should already be located on the replicated volume, as shown below. If not, you will have to relocate the files before attempting to cluster the VM.

To begin the clustering process, open up Failover Cluster Manager. Right click on Configure Roles and choose Virtual Machine as the role you want to create.

This will launch the High Availability Wizard. At this point you should select the VM that you want to cluster and step through the wizard as shown below.

You will see that the VM resource will be created, but there will be some warnings. The warnings indicate that the E drive is not currently part of the VM Cluster Resource Group.

To make the DataKeeper Volume E part of the VM Cluster Resource Group, right click on the role and choose Add Storage. Add the DataKeeper Volume that you will see listed in Available Disks.

The last part is to choose the Properties of the Virtual Machine Configuration (not the Virtual Machine) resource and make it dependent upon the storage you just added to the resource group.

You should now be able to start the VM.

Option 2 – Creating a Highly Available VM from Scratch

Assuming you want to create a highly available VM from scratch, you can complete this entire process from the Hyper-V Virtual Machine Manager as shown below. This step assumes that you have already created a mirror of the E drive using DataKeeper as described in Configuring the DataKeeper Volume Resource section.

To get started, open the Failover Cluster Manager and right click on Roles and choose Virtual Machine – New Virtual Machine.

Follow through with the steps of the wizard and select the options that you want to use for the VM. When choosing where to place the VM, select the cluster node that currently is the owner of Available Storage, which will also be the source of the mirror.

Make sure when specifying the Name and Location of the VM, you select the location of the replicated volume.

The rest of the options are up to you. Just make sure the VHD file is located on the replicated volume.

You will see the highly available VM is created, but there is a warning about the storage. You will need to add the DataKeeper Volume Resource to the VM Cluster Resource Group as shown below.

After the DataKeeper Volume is added to the VM Cluster Resource Group, you will need to add the DataKeeper Volume as a dependency of the Virtual Machine Configuration resource.

You now have a highly available virtual machine.

Summary

In this blog post we discussed what constitutes a #SANLess cluster. We discussed how DataKeeper Cluster Edition can be used to build a highly available Hyper-V cluster without the use of a SAN. Once built, the cluster behaves exactly like a SAN based cluster, including having the ability to do Live Migration, Quick Migration and automated failover in the event of unexpected failures.

A #SANLess cluster eliminates the expense of a SAN as well as the single point of failure of a SAN. DataKeeper Cluster Edition supports multiple nodes in a SAN, so configurations that stretch both LAN and WAN are all possible solutions for Hyper-V high availability and disaster recovery. DataKeeper supports any local storage, opening up the possibility of using high speed local attached SSD or NAND Flash storage for high performance without giving up high availability.

 

 

 

 

 

 

 

 

 

 

Creating a multi-site cluster in Windows Azure for Disaster Recovery #Azure #Cloud

This is the 4th post in my series on High Availability and Disaster Recovery for Windows Azure. This is a step-by-step post, or a “how to” post that will build upon the Azure configuration that we built during my first three articles…

  1. How to Create a Site-to-Site VPN Tunnel to the Windows Azure Cloud Using a Window Server 2012 R2 Routing and Remote Access (RRAS) Server
  2. Extending Your Datacenter to the Azure Cloud #Azure
  3. Creating a SQL Server 2014 AlwaysOn Failover Cluster (FCI) Instance in Windows Azure IaaS #Azure #Cloud

We are now going to extend the existing cluster (SQL1 and SQL2) to your local data center, SQL3. This configuration will give you both high availability for your application within the Azure Cloud, as well as a disaster recovery solution should Azure suffer a major outage. You could configure this in reverse as well with your on premise datacenter as your primary site and use Windows Azure as your disaster recovery site. And of course this solution illustrates SQL Server as the application, but any cluster aware application can be protected in the same fashion.

At this point, if you have been following along your network should look like the illustration below.

Add SQL3 to the cluster

To add SQL3 to the cluster the first thing we need to do is make sure SQL3 is up and running, fully patched and added to the domain. We also need to make sure that it has an F:\ drive attached that is of the same size as the F:\ drives in use in Azure. And finally, if you relocated tempdb on the SQL cluster, make sure you have the directory structure where tempdb is located pre-configured on SQL1 as well.

Next we will add the Failover Cluster feature to SQL3.

With failover clustering installed on SQL3, we will open Failover Cluster Manager on SQL1 and click Add Node

Select SQL3 and click Next

Run all the validation tests on SQL3

Let’s take a look at some of the warnings in the validation report. The RegisterAllProvidersIP property is set to 1, which can be good in a multisite cluster. You can read more about this setting here: http://technet.microsoft.com/en-us/library/ca35febe-9f0b-48a0-aa9b-a83725feb4ae

This next warning talks about only having a single network between the cluster nodes. At this time Azure only supports a single network interface between VMs, so there is nothing you can do about this warning. However, this network interface is fully redundant behind the scenes, so you can safely ignore this message.

Of course you are going to see a lot of warnings around storage. That’s because this cluster has no shared storage. Instead it relies on replicated storage by SIOS DataKeeper Cluster Edition. As stated below, this is perfectly fine as the database will be kept in sync with the replication software.

We are now ready to add SQL3 to the cluster.

Once you click Finish, SQL3 will be added to the cluster as shown below.

However, there are a few things we need to do to complete this installation. Next we will work of the following steps:

  • Add an additional IP address to the Cluster Name Object
  • Tune the heartbeat settings
  • Extend the DataKeeper mirror to SQL3
  • Install SQL 2014 on SQL3

Add an additional IP address to the Cluster Name Object

When we added SQL3 to the cluster it went from a single site cluster to a multi-subnet cluster. If the cluster was originally created as a single site cluster and you later add a node that resides in a different subnet, you have to manually add a second IP address to the Cluster Name Object and create an OR dependency. For more information on this topic, view the following article. http://blogs.msdn.com/b/clustering/archive/2011/08/31/10204142.aspx

To add a second IP address to the Cluster Name Object (CNO), we must use the PowerShell commands described in the article mentioned above.

Now if you are following along with the MSDN article I referenced, you would expect to see these “NewIP” somewhere in the GUI. However, at least with Windows 2012 R2 I am not currently seeing this resource in the GUI.

However, if I right click on the SQLCLUSTER name and choose properties and try to add NewIP as a dependency, I see it is listed as a possible resource.

Choose “NewIP” and also make the dependency type “OR” as shown below.

Once you click OK, it now appears in the GUI as an IP Address that needs to be configured.

We can now choose the properties of this IP Address and configure the address to use an IP address that is not currently in use in the 10.10.10.0/24 subnet, which is the same subnet where SQL3 resides.

Tune the Heartbeat Settings

We now are ready to tune the heartbeat settings. Essentially, we are going to be a little more tolerant with network communication, since SQL3 is located across a VPN connection with some latency on the line and we only have the single network interface on the cluster nodes. I highly recommend you read this article by Elden Christensen to help you decide what the right settings for your requirements are: http://blogs.msdn.com/b/clustering/archive/2012/11/21/10370765.aspx

For our environment, we are going go to what he is calling the “Relaxed” setting by setting the SameSubnetThreshold to 10 heartbeats and the CrossSubnetThreshold to 20 heartbeats.

The commands are:

(get-cluster).SameSubnetThreshold = 10

(get-cluster).CrossSubnetThreshold = 20

What this means is that heartbeats will continue to be sent every 1 second, but a SQL1 and SQL2 will only be considered dead after 10 missed heartbeats. SQL3 will be dead after 20 missed heartbeats. This will increase your Recovery Time Objective slightly (5-10 seconds), but it will also eliminate potential false failovers.

Extend the DataKeeper mirror to SQL3

Before we can install SQL 2014 on SQL3 we must extend the DataKeeper mirror so that it includes SQL3 as a replication target. Of course you must install DataKeeper Cluster Edition on SQL3 first, and make sure that is has a F:\ drive at least as big as the source of the mirror. Once DataKeeper is installed

 

 

Install SQL 2014 on SQL3

Now it is time to install SQL 2014 onto the 3rd node. The process is exactly the same as it was to install in on SQL2. Start by launching SQL Setup on SQL3.

Run through all the steps…

At this point in the installation you have to pick an IP address that is valid for SQL3’s subnet. The cluster will add this IP address with an “OR” dependency to the client access point.

Enter the passwords for your service accounts

After you complete the installation let the fun begin. You now have a multisite SQL Server cluster that should look something like this.