SQL Server High Availability in AWS #Cloud
Have you been thinking about moving to the cloud? The potential cost savings makes it nearly impossible not to consider. The cost justification is usually easy to figure out and the cloud almost always comes out looking like a good investment. However, after you stop counting the money you are going to save you start thinking about things like security and availability and wonder whether the cloud is for you.
In a traditional data center you have the control and can deploy whatever security and high availability solution you like. However, once you decide to move your servers to the cloud your choices can become much more limited. It doesn’t matter whether you’re with Amazon, Google or Microsoft, outages in the cloud can and do occur and you need to do whatever you can to mitigate such risks.
Let’s take a closer look at Amazon Web Services (AWS) for instance. What are the options you have to ensure that your SQL Server database can survive an unexpected outage? While some applications can be deployed in a load balanced configuration across multiple availability zones, SQL Server is generally not deployed in a load balanced configuration. What this means is that SQL Server itself resides in a single availability zone and if that zone should become unavailable, your whole application stack can come to a grinding halt.
If you read this article by Miles Ward, you will see that with SQL Server 2008 R2 your availability options are pretty limited. In that article on page 11 there is a nice chart that lays out your HA options. As you will see, the options are severely limited and mostly fall outside of the category which would be described as HA. Log shipping, mirroring and transactional replication are pretty much the only options you have, and they are more of a data protection options rather than HA options. If you want Microsoft failover clustering you will find yourself out of luck due to some network limitations (clients can’t connect to a clustered IP address) in AWS and the lack of a shared disk resource required for traditional SQL clusters.
If you are looking to deploy SQL Server 2012, your options get a little better. As described by Jeremy Peschka, with a little manual intervention you can deploy AlwaysOn Availability Groups in AWS to do asynchronous replication from your data center to AWS, or even between AWS availability groups. Of course this assumes you have the SQL 2012 Enterprise license required for AlwaysOn Availability Groups. The only “issue” is that AWS really doesn’t support moving cluster IP address from one server to another, so client redirection has to be done manually using the ec2-unassign-private-ip-addresses and ec2-assign-private-ip-addresses commands after switchover that Peschka describes in his article. All-in-all this is a very manual process, which again does not really fit the description of a highly available system.
If you can live without automated recovery and with the limitations of AlwaysOn Availability Groups that I described in a previous blog post, then you might just want to go ahead and try the AlwaysOn Availability Group deployment in AWS. However, if you are looking for an easier, more affordable, more robust HA solution, I have some really good news. SIOS Technology Corp has been looking at this problem and has developed a solution that overcomes all of the limitations previously described and will be available as an AMI for easy deployment. This solution is currently in private beta, but will be widely available later this year.
The SIOS solution is based on SQL server in a Microsoft Failover Clustering using DataKeeper Cluster Edition host based replication. By using hosted based replication they have overcome the first obstacle of clustering in EC2 – lack of shared storage. The second obstacle that SIOS had to overcome was the issue of client redirection described by Peschka; the client access point needs to be manipulated from within EC2, not failover clustering. SIOS has built intelligence into their AMI solution such that the reassigning of the IP address is automated as part of the cluster failover process, effectively simulating the behavior you would normally expect from a cluster.
And because all of this is built on top of failover clustering, this can be deployed using SQL 2008/2008 R2 or 2012. Even the Standard Edition of SQL Server will support a 2-node cluster so the cost savings vs. deploying SQL 2012 AlwaysOn Availability groups could be substantial.
Let me know what you think. Does this solution sound interesting? What are you doing today to ensure the availability of your SQL Server EC2 instances?