Protecting against downtime associated with cloud outages is something that anyone deploying on ANY cloud service needs to address. While it could be easy to simply deploy your app in “the cloud” and assume that it is someone else’s problem to manage now, the reality is that while cloud providers probably have more resources and expertise to ensure your servers stay up, the ultimate responsibility to ensure that your critical application is available rests squarely on your shoulders.
Believe it or not, simply deploying your SQL Server in Windows Azure does not make it “highly available”. To make it highly available you must use traditional tools and techniques that you might use in your own datacenter. While there is some varying of opinion on this topic, I believe that SQL Server 2012/2014 high availability options are as follows:
- AlwaysOn Failover Cluster Instance
- AlwaysOn Availability Groups
- Multisite Cluster (high availability AND disaster recovery)
Regardless of which option you choose, you are going to want to become familiar with the Windows Azure Fault Domain as described below:
“Nonetheless, in Windows Azure a rack of computers is indeed identified as a fault domain. And the allocation of a fault domain is determined by Windows Azure at deployment time. A service owner cannot control the allocation of a fault domain, however can programmatically find out which fault domain a service is running within. Windows Azure Compute service SLA guarantees the level of connectivity uptime for a deployed service only if two or more instances of each role of a service are deployed”
So before you get started, when you deploy your Windows Azure VMs, you must make sure that each SQL Server and any “witness” servers reside in different Fault Domains. You do this by putting all of the VMs in the same “Availability Set”. This ensures that each server in the same Availability Set resides in a different Fault Domain, hopefully eliminating all single points of failure.
1 – VMs in the same Availability Set will be provisioned in different Fault Domains
By putting all of your VMs in different Fault Domains and configuring a SQL Server Failover Cluster or Availability Group, you are protecting against the usual types of outages that might be localized to a single rack of servers, AKA, Fault Domain. I’ve written a step-by-step article entitled Creating a SQL Server 2014 AlwaysOn Failover Cluster (FCI) Instance in Windows Azure IaaS which should help in your endeavor to build resiliency within the Azure cloud for your SQL Server.
But what happens if Windows Azure has a major outage that takes out a whole region? Natural disaster or human error would likely be the cause of such an outage. Unfortunately, at this point there is no way to stretch an Azure Virtual Private Network between two different Azure Regions, which includes: West US, West Europe, Southeast Asia, South Central US, North Europe, North Central US, East Asia, and East US. However, the Azure Virtual Private Network can support a site-to-site VPN connection with a limited number of VPN devices from Cisco, Juniper and even Microsoft RRAS.
That leads us to thinking about alternate locations outside of Azure, even our own private data center. I recently wrote a step-by-step article that explains how to extend your on premise datacenter to the Azure Cloud. Once you have your datacenter connected to Windows Azure, you can either configure AlwaysOn Availability Groups or AlwaysOn Failover Clustering (multisite) for protection from a catastrophic Azure failure. I’ve written previously about the Advantages of Multisite Clustering vs. Availability Groups, so in my lab I decided to create a 2-node SQL Failover Cluster Instance up in Azure and then add a 3rd node in my primary data center. I’ve written the detailed configuration steps in my blog post entitled Creating a Multisite Cluster in Windows Azure for Disaster Recovery.
If you rather use AlwaysOn Availability Groups, you probably want to visit the tutorials called AlwaysOn Availability Groups in Windows Azure (GUI) and Listener Configuration for AlwaysOn Availability Groups in Windows Azure. If you are using SQL 2008 R2 or earlier I’m sure you could configure database mirroring, but at this point if your are moving to Azure I’m assuming you are probably deploy SQL Server 2012 or 2014. Other technology like Log Shipping and Replication are options for moving data, but I don’t consider them high availability solutions.
If you are deploying highly available SQL Server in Windows Azure IaaS please leave me a comment; I’d love to know what you are doing. If you have any questions please leave a comment as well and I will be sure to get back to you.
5 thoughts on “Windows Azure High Availability Options for SQL Server #Azure #Cloud #IaaS”
great posts! Have you tried to create an Always-On Availability Group spanning over different regions? I deployed a SQL2014 Always-On environment from the new Azure portal, but to my big suprise it created the listener with a public IP and not a Private IP. Any experience?
You can span regions, but you will have to configure the networking. Hopefully in a future blog I can tackle that topic. If you follow the guidance, you client access point should have the same IP address as your load balancer. If you use an Internal Load Balancer that would be a private address. However, if you use an external load balancer than yes, it would be a public IP address.
I see from some of your articles (very informative) that there are a few benefits to AOFCI over AOAG (alone). However, it appears as though going the AOFCI route requires something like DataKeeper (an expensive proposition). Has Microsoft come up with anything in the way of supported shared storage that can be used for clustered nodes as an alternative to DataKeeper? We are in the midst of building out an infrastructure for our SaaS solutions and I want to be sure we construct properly for our requirements.
Thanks for your feedback
Some workloads support simple SMB3 file shares using Microsoft’s Scale Out File Server (SOFS), but in order to build a redundant SOFS you are talking about standing up two or more Windows Servers and purchasing addition shared JBOD. At the end of the day that will cost more than DataKeeper.
Thanks for the feedback. So far, our infrastructure consists of 10+ VMs, among which include a set for DFS replication of multiple terabytes used for simple file storage (seeing as the Azure Files) is still in preview. So a few extra VMs and disks may not make such an impact. My other concern is that we may be asked to replicate out this infrastructure for other clients (not wanting to run on SaaS multi-tenant) so we expect to have additional cloned infrastructures for testing and dev. and potentially many other clients. I’m just a bit concerned about eventually becoming a reseller for DataKeeper every time we clone out the deployment.
In either case, thanks for the quick feedback.