#Azure Storage Service Interruption…Time for “Plan B”

Yesterday evening Pacific Standard Time, Azure storage services experienced a service interruption across the United States, Europe and parts of Asia, which impacted multiple cloud services in these regions.

As part of a performance update to Azure Storage, an issue was discovered that resulted in reduced capacity across services utilizing Azure Storage, including Virtual Machines, Visual Studio Online, Websites, Search and other Microsoft services.

Read the whole report on the Azure blog. http://azure.microsoft.com/blog/2014/11/19/update-on-azure-storage-service-interruption/

So what does this outage mean to those thinking about a cloud deployment? Global “interruptions” of this magnitude certainly cannot occur on any regular basis for any cloud provider that intends to remain in the cloud business, whether they are Microsoft, Amazon, Google or other. However, as a cloud architect or person responsible for a cloud deployment, you have a responsibility to your customer to have a “Plan B” in your back pocket in case the worst case scenario actually happens.

What exactly is a “Plan B”? Plan B involves having a documented procedure for recovering data and services in an alternate location in the event of a wide spread outage that impacts a cloud provider’s ability to deliver their service, despite deploying what you thought was a highly resilient cloud deployment designed to keep running even in the event of localized outages within a region, availability zone or fault domain.

At a high level you should be concerned about three things: Data Recovery, Application Recovery, and Client Access. There are many ways to address these concerns, some more automated than others and some with a better Recovery Time Objective (RTO) and Recovery Point Objective (RPO) than others.

It was just last week that I blogged about how to create a multisite cluster that stretched between the AWS cloud and the Azure cloud. This type of configuration is just what is needed in the event of an outage of the magnitude that we just experienced yesterday in the Azure cloud. https://clusteringformeremortals.com/2014/11/18/cloud-resiliency-for-sqlserver-failover-clusters-aws-to-azure-multisite-cluster/

Figure 1 – Example of a Cloud-to-Cloud Multisite Cluster Configuration

Another alternative to the “cloud-to-cloud” replication model is of course utilizing your own datacenter as a disaster recovery site for your cloud deployment. The advantages of this is that you have physical ownership of your data, but of course now you are back in the business of managing a datacenter, which can negate some of the benefit of a pure cloud deployment.

Figure 2 – Hybrid Cloud Deployment Model

If you are not ready to go full on cloud, you can still make use of the cloud as a disaster recovery site. This is probably the easiest and most cost effective way to implement an offsite datacenter for disaster recovery and to start taking advantage of what the cloud has to offer without fully committing to moving all your workloads into the cloud.

Figure 3 – Using the Cloud as a Disaster Recovery Site

The illustrations shown above make use of the host based replication solution called DataKeeper Cluster Edition to build multisite SQL Server clusters. However, DataKeeper can be used to keep any data in sync, either between different cloud providers or in the hybrid cloud model.

Microsoft is not alone in dealing with cloud outages as outages have impacted Google, Microsoft, Amazon, DropBox and many others just this year alone. Having a “Plan B” in place is a must have anytime you are relying on any cloud service.

#Azure Storage Service Interruption…Time for “Plan B”

Amazon EC2 Storage and Instance Size Considerations

When you launch a new instance you only have two options for the OS storage: Standard or Provisioned IOPS. Both are EBS volumes persistent across reboots. Many instances come with a bunch of extra ephemeral drives attached, which are NOT persistent. I usually delete these ephemeral drives so I am not tempted to store data on them. You will have to add additional EBS volumes for additional persistent storage.

This article seems to indicate that you can launch AMI’s based on the “EC2 Instance Store”, which is NOT persistent, but I’ve never seen that option. All of my instances have always had root devices that are EBS based; I have not seen one that is not EBS based. I’m assuming they mean some of the instances in the Amazon Market Place may use non-persistent volumes. http://docs.aws.amazon.com/AWSEC2/latest/UserGuide/RootDeviceStorage.html

You’ll see the root device when you launch the instance, like I highlighted below. As long as EBS is the root device you are good to go and can be sure your changes will persist across reboots.

 

As far as instance size, it will depend on the needs of the application. The good thing about EC2 is that if you provision an AMI that is under powered, you can go back and increase the instance size, though it does require a reboot. If IOPS are important, you will want to make sure you choose an instance that is EBS optimized. See this page for the instance details. http://aws.amazon.com/ec2/instance-types/#instance-details . You’ll see the first instance type which is EBS optimized is M1.large.

Read this guide for additional tips for optimal storage configuration. http://docs.aws.amazon.com/AWSEC2/latest/UserGuide/EBSPerformance.html . One of the best tips for increased IOPS is to use multiple smaller EBS volumes and put them together in a RAID 0 on the Windows server. Because the EBS volumes are RAID1 on the backend, you are essentially deploying RAID 1+0 in your VM for optimal performance and availability.

Amazon EC2 Storage and Instance Size Considerations

It is now cheaper to get provisioned IOPS on Amazon EC2 EBS

In the old days if you wanted a guaranteed 4000 IOPS on EBS, you had to provision a minimum of a 400 GB vo0lume. Considering you pay per the GB, and provisioned IOPS are not cheap, if you only needed 100 GB of fast storage you were stuck paying for 300 GB of unused storage.

With this recent announcement Amazon has made it easier to get fast storage in smaller increments. Now if you want 4000 IOPS you can get that in EBS volumes as small as 133 GB up to 1 TB in size. Read the following press release for more information.

http://aws.amazon.com/about-aws/whats-new/2013/10/09/ebs-provisioned-iops-maximum-iops-gb-ratio-increased-to-30-1/?sc_ichannel=EM&sc_icountry=EN&sc_icampaign=13Oct_Newsletter&Campaign_id=35060660&ref_=pe_467350_35060660_14

It is now cheaper to get provisioned IOPS on Amazon EC2 EBS

Webinar Invite: How to Deploy SQL Server AlwaysOn Failover Clusters in Amazon EC2 with @awscloud #amazonaws

Deploying Your Business Critical SQL Server Apps on Amazon EC2

 

Amazon Web Services (AWS) and SIOS Technology Corp, an AWS Partner Network (APN) Technology Partner, invite you to attend this live webinar to learn how to optimize mission critical SQL Server deployments on Amazon EC2.

Learn how to take advantage of the cost benefits and flexibility of Amazon EC2 while maintaining protection with native Microsoft Windows Server Failover Clustering – all without shared storage.

Who should attend:

Solution Architects, Developer, Development Leads and other SQL Professionals

Presenters:

Miles Ward, Solutions Architect, Amazon Web Services

Tony Tomarchio, Director of Field Engineering, SIOS Technology Corp

Date / Time:

Wednesday, June 5, 2013 – 10AM PT / 1PM ET

Click here to register

http://bit.ly/10VLtDu

Webinar Invite: How to Deploy SQL Server AlwaysOn Failover Clusters in Amazon EC2 with @awscloud #amazonaws

Is AlwaysOn Availability Groups a viable alternative to AlwaysOn Failover Clusters in the public cloud?

I recently read an article entitled SQL Server 2012 AlwaysOn: High Availability database for cloud data centers where the author John Joyner makes a case for using AlwaysOn Availability Groups for SQL Server high availability in the cloud. I have been investigating AlwaysOn Availability Groups since it was available in pre-release versions of SQL Server 2012 and while it certainly has some valid uses (mostly in disaster recovery configurations), saying that it is a “new way to achieve HA SQL” glosses over many of the issues which make deploying AlwaysOn Availability Groups as a replacement for failover clusters simply not a viable option in many cases and not a good idea in the rest of the cases. In a response I wrote to the article I proposed that an AlwaysOn Multisite Clustering using the host based replication solution DataKeeper Cluster Edition is a much better alternative and I went ahead and explained why..

My original response to the article seems to have been deleted, so I decided to repost my response to the original article below:

There are a few things to consider with AlwaysOn Availability Groups. As you mention, “Microsoft announced support for some System Center 2012 SP1 applications to work with SQL AlwaysOn”, meaning that there are still applications that do not support AlwaysOn. In fact, there are a LOT of applications that do not support AlwaysOn Availability Groups, including any applications that use distributed transactions. And what about the other limitations, like not being able to keep MSDB, Master and other databases in sync? I blog about these limitations here.

https://clusteringformeremortals.com/2012/11/09/how-to-overcome-the-limitations-of-sql-server-alwayson-availability-groups-sqlpass/

I agree that SQL HA is important, however, the only way to get “High Availability” (meaning automatic recovery in the event of a failure) with AlwaysOn Availability Groups is by using synchronous mirroring. At PASS Summitt in Seattle earlier this month I sat in many different presentations on AlwaysOn and almost without fail the presenters talked about AlwaysOn in an asynchronous configuration. The reason being is that AlwaysOn synchronous replication has a SIGNIFICANT impact on the performance of your application. I have personally measured up to a 68% performance penalty with AlwaysOn Synchronous mirroring, and that was across a dedicated 10 Gbps LAN! I blog about this result here.

https://clusteringformeremortals.com/2012/11/09/how-to-overcome-the-performance-problems-with-sql-server-alwayson-availability-groups-sqlpass/

Unfortunately, in an asynchronous configuration you give up automatic failover, so you really are not getting HA, you are getting data protection, but certainly not the same RTO as you can expect from a traditional SQL failover cluster.

And then finally there is the cost to consider. SQL Server 2012 Enterprise is nothing to sneeze at. If you want to build a 2-node cluster and take advantage of readable secondaries and you are using a 2-socket, 16-core servers you are looking at shelling out close to $220k for SQL Server 2012 Enterprise licenses. I broke down the associated cost in my blog article here.

https://clusteringformeremortals.com/2012/11/09/want-sql-server-alwayson-features-but-cant-afford-sql-2012-enterprise-edition-sqlpass/

Don’t get me wrong, SQL Server 2012 AlwaysOn Availability Groups can solve many problems, but I would not categorize the asynchronous configuration required in most cloud deployments as an HA alternative. Many people are overlooking the other “AlwaysOn”, AlwaysOn Failover Clusters. New features of SQL Server AlwaysOn Failover Clusters, including enhanced support for cross subnet multisite clusters, will give you a true HA solution and overcomes all of the limitations I describe above. Of course in a pure cloud solution you may not be able to integrate with array based replication to support multisite clusters, but you can always use host based replication solutions such as SteelEye DataKeeper Cluster Edition to build multisite clusters in public or private clouds and in your own physical data center and you can do this with SQL Server 2008 through 2012 AND it works on SQL Server Standard edition as well as Enterprise.

Have you done any testing with AlwaysOn Availability Groups in a HA configuration? If so I’d be curious to know if you measured the overhead associated with synchronous replication in your environment.

Is AlwaysOn Availability Groups a viable alternative to AlwaysOn Failover Clusters in the public cloud?