Clustering For Mere Mortals

#Azure Storage Service Interruption…Time for “Plan B”

Posted in Amazon, AWS, Azure, Cloud, DataKeeper, EC2, High Availability, SQL, WSFC by daveberm on November 20, 2014

Yesterday evening Pacific Standard Time, Azure storage services experienced a service interruption across the United States, Europe and parts of Asia, which impacted multiple cloud services in these regions.

As part of a performance update to Azure Storage, an issue was discovered that resulted in reduced capacity across services utilizing Azure Storage, including Virtual Machines, Visual Studio Online, Websites, Search and other Microsoft services.

Read the whole report on the Azure blog. http://azure.microsoft.com/blog/2014/11/19/update-on-azure-storage-service-interruption/

So what does this outage mean to those thinking about a cloud deployment? Global “interruptions” of this magnitude certainly cannot occur on any regular basis for any cloud provider that intends to remain in the cloud business, whether they are Microsoft, Amazon, Google or other. However, as a cloud architect or person responsible for a cloud deployment, you have a responsibility to your customer to have a “Plan B” in your back pocket in case the worst case scenario actually happens.

What exactly is a “Plan B”? Plan B involves having a documented procedure for recovering data and services in an alternate location in the event of a wide spread outage that impacts a cloud provider’s ability to deliver their service, despite deploying what you thought was a highly resilient cloud deployment designed to keep running even in the event of localized outages within a region, availability zone or fault domain.

At a high level you should be concerned about three things: Data Recovery, Application Recovery, and Client Access. There are many ways to address these concerns, some more automated than others and some with a better Recovery Time Objective (RTO) and Recovery Point Objective (RPO) than others.

It was just last week that I blogged about how to create a multisite cluster that stretched between the AWS cloud and the Azure cloud. This type of configuration is just what is needed in the event of an outage of the magnitude that we just experienced yesterday in the Azure cloud. http://clusteringformeremortals.com/2014/11/18/cloud-resiliency-for-sqlserver-failover-clusters-aws-to-azure-multisite-cluster/

Figure 1 – Example of a Cloud-to-Cloud Multisite Cluster Configuration

Another alternative to the “cloud-to-cloud” replication model is of course utilizing your own datacenter as a disaster recovery site for your cloud deployment. The advantages of this is that you have physical ownership of your data, but of course now you are back in the business of managing a datacenter, which can negate some of the benefit of a pure cloud deployment.

Figure 2 – Hybrid Cloud Deployment Model

If you are not ready to go full on cloud, you can still make use of the cloud as a disaster recovery site. This is probably the easiest and most cost effective way to implement an offsite datacenter for disaster recovery and to start taking advantage of what the cloud has to offer without fully committing to moving all your workloads into the cloud.

Figure 3 – Using the Cloud as a Disaster Recovery Site

The illustrations shown above make use of the host based replication solution called DataKeeper Cluster Edition to build multisite SQL Server clusters. However, DataKeeper can be used to keep any data in sync, either between different cloud providers or in the hybrid cloud model.

Microsoft is not alone in dealing with cloud outages as outages have impacted Google, Microsoft, Amazon, DropBox and many others just this year alone. Having a “Plan B” in place is a must have anytime you are relying on any cloud service.

Cloud Resiliency for #SQLServer Failover Clusters: #AWS to #Azure multisite cluster

Posted in Amazon, AWS, Azure, Cloud, DataKeeper, EC2, SQL by daveberm on November 18, 2014

Ever want to deploy a SQL Server cluster with some nodes in AWS and other nodes in Azure? Well, you are in luck! This article describes the process in great detail.

http://dbinsight.com.au/dbinsight/sql-server-failover-between-aws-and-azure-part-1

 

New IP Address Options for Azure IaaS VMs #Azure

Posted in Azure, Cloud, DataKeeper, High Availability, WSFC by daveberm on November 12, 2014

The Windows Azure team has been very busy recently adding a bunch of new features to Azure IaaS. Here are just some of the features you should check out.

Static IP for Azure instances
Until recently Azure VMs got their IP addresses from DHCP exclusively. There were some tricks to make your VMs get the same IP address most of the time, but in reality you really couldn’t guarantee the VMs would always get the same IP address as DHCP reservations are not supported. With this new feature you can not only assign a static private IP address to each VM, you can also assign a static public IP address to each VM. Previously public IP addresses were only used to address Cloud Services.

Multiple NIC Cards – Multiple NIC support is currently only available in the Northern Europe region, but will be rolled out worldwide soon according to Microsoft. Multiple NIC support will allow you to manage network traffic better. Personally I will be using multiple NICs in my failover cluster configuration for network redundancy and to keep my DataKeeper replication traffic separate from my client access traffic.

Internal Load Balancer – As of Oct 8th you can now provision a single Internal Load Balancer (ILB) per Cloud Service. This is a HUGE improvement as you are now able to configure multi-tier applications that reside within the same Cloud Service and you no longer have to rely on External Load Balancers which send your traffic across the public network. The best new use case for this though is that this is now the recommend best practice for client access points when building failover clusters in Windows Azure. Check out this great new blog on the Azure Blog that talks about building failover cluster instances on Azure with SIOS DataKeeper Cluster Edition.

Check back soon for a Step-by-Step article on configuring a SQL Server 2014 Failover Cluster instance in Azure IaaS using DataKeeper Cluster Edition and all of these great new features.

High Performance SQL Server in Azure IaaS #SQLServer #Azure

Posted in Azure, Cloud, SQL by daveberm on November 12, 2014

If you want your SQL Server instances to really hum in Azure, you need to read this article.

http://blogs.technet.com/b/dataplatforminsider/archive/2014/09/25/using-ssds-in-azure-vms-to-store-sql-server-tempdb-and-buffer-pool-extensions.aspx

Just remember, if you are going to relocate the tempdb or buffer pool extensions in a SQL Server Failover cluster in Azure IaaS, you will have to either relax the permissions on the root of the D drive and store them there or create a generic script cluster resource that recreates the folder structure upon failover because the SSD is not persistent and any folders you create will be deleted each time you reboot. The article talks about creating a script that runs at startup, but in a clustered environment I’m afraid that the cluster would try to start SQL server before the directory structure was created. It would be better to create a Generic Script cluster resource and make the SQL Server cluster resource dependent on this generic service to ensure the folder is created before SQL tries to start.

.Net 3.5 Framework Refuses to Install on Windows Server 2012or 2012 R2

Posted in Azure, Cloud by daveberm on November 11, 2014

If you are anything like me, you probably just started running into this issue where you just can’t get .Net 3.5 Framework to install on your server any more. It turns out some recent security updates broke something and Microsoft has release a hotfix to address this issue. Go ahead and install this update on your system and you should have no problems installing .Net 3.5 Framework.

https://support.microsoft.com/kb/3005628

Windows Server 10 New Cloud Witness

Posted in Azure, Cloud, DataKeeper, High Availability, SQL, Windows 10, WSFC by daveberm on November 6, 2014

My favorite new cluster feature in Windows Server 10 is the Cloud Witness. The Cloud Witness is another option in addition to the traditional disk witness and file share witness which are used when configuring the quorum in a Windows Server Failover Cluster. For a complete history of cluster quorums and their options please read my article on the Microsoft Press blog…….

So what exactly is a Cloud Witness? A Cloud Witness utilizes a Windows Azure IaaS Storage Account to act as a vote in your cluster quorum. It can be used instead of a disk witness or a fail share witness. The cluster nodes simply need public internet access to reach an Azure storage account that you have provisioned as part of your Azure subscription.

So why would I use a disk witness? In most shared storage clusters you will still use a node and disk witness majority quorum. However, when you are doing #SANLess clusters, or multisite clusters, you now have another option to consider instead of a file share witness. Let’s look at some scenarios where a Cloud Witness would make more sense than a File Share Witness.

Scenario 1 – Multisite Cluster

If you have done your research on multisite clusters, you will have discovered that if you want automatic failover in the event of a complete site loss, the only safe way to do this is to have an even number of cluster votes in each site and to configure a File Share Witness in a 3rd site. In addition, the network connection between your primary site and your DR site must be completely independent of the network connection you have between this 3rd site and your primary and DR sites.

The cost associated with maintaining a completely independent network and having access to a 3rd data center for hosting a file share witness is not always possible. This is where having a Cloud Witness in Windows Azure comes in handy. Assuming you have an equal number of cluster votes in each data center and each data center also has access to the internet, you can define a Windows Azure Storage account as a Cloud Witness instead of a File Share Witness. Using a Cloud Share Witness eliminates the cost associated with maintaining a 3rd data center. There will be a slight monthly fee for the Azure Cloud service, but this will be minimal in comparison to the cost associated with maintaining a File Share Witness.

Scenario 2 – #SANLess Hyper-V Cluster at Remote Office/Branch Office (ROBO)

Here is the scenario. You run a fast food chain, department store chain, drug store chain, etc. You have the need to run a handful of servers to support your local operations at each of your store fronts. You decide that running these servers as virtual machines in Hyper-V are the way you want to go. Having these servers highly available is very important, so you decide it would be best to implement a two node cluster at each location. To minimize costs and to make management easy, you decide to purchase an identical pair of servers for each location and use the locally attached storage to build a #SANLess cluster with DataKeeper Cluster Edition. You come to realize that because you went #SANLess you don’t have access to a disk witness. And also, because you didn’t plan on purchasing a 3rd server for each location, a file share witness is also out of the question. You are in a real conundrum…a 2 node cluster NEEDS A WITNESS!

Here is where the Cloud Witness in Windows Azure comes and saves the day. Assuming your servers have access to the internet, a simple Cloud Witness can be configured and now you can support a 2-node #SANLess Hyper-V Cluster in each location. I would configure a non-clustered DC VM on each physical server and then create as many highly available VMs as a need in the cluster just using local attached storage.

Cloud Witness is a great new option in Windows Server 10. The only thing that would make it better is if they back ported it to Windows Server 2012 R2 so I could use it today!

 

UPDATE 11/5/2014 – When you create your Storage Account in Azure, make sure you choose “Locally Redundant” as Geo-Redundant Storage is not supported for the Cloud Witness.

Windows Server 10 “Cloud Witness” in a failover cluster

Posted in Azure, Cloud, DataKeeper, Windows 10, WSFC by daveberm on November 2, 2014

My favorite new cluster feature in Windows Server 10 is the Cloud Witness. The Cloud Witness is another option in addition to the traditional disk witness and file share witness which are used when configuring the quorum in a Windows Server Failover Cluster. For a complete history of cluster quorums and their options please read my article on the Microsoft Press blog http://blogs.msdn.com/b/microsoft_press/archive/2014/04/28/from-the-mvps-understanding-the-windows-server-failover-cluster-quorum-in-windows-server-2012-r2.aspx

So what exactly is a Cloud Witness? A Cloud Witness utilizes a Windows Azure IaaS Storage Account to act as a vote in your cluster quorum. It can be used instead of a disk witness or a fail share witness. The cluster nodes simply need public internet access to reach an Azure storage account that you have provisioned as part of your Azure subscription.

So why would I use a disk witness? In most shared storage clusters you will still use a node and disk witness majority quorum. However, when you are doing #SANLess clusters, or multisite clusters, you now have another option to consider instead of a file share witness. Let’s look at some scenarios where a Cloud Witness would make more sense than a File Share Witness.

Scenario 1 – Multisite Cluster

If you have done your research on multisite clusters, you will have discovered that if you want automatic failover in the event of a complete site loss, the only safe way to do this is to have an even number of cluster votes in each site and to configure a File Share Witness in a 3rd site. In addition, the network connection between your primary site and your DR site must be completely independent of the network connection you have between this 3rd site and your primary and DR sites.

The cost associated with maintaining a completely independent network and having access to a 3rd data center for hosting a file share witness is not always possible. This is where having a Cloud Witness in Windows Azure comes in handy. Assuming you have an equal number of cluster votes in each data center and each data center also has access to the internet, you can define a Windows Azure Storage account as a Cloud Witness instead of a File Share Witness. Using a Cloud Share Witness eliminates the cost associated with maintaining a 3rd data center. There will be a slight monthly fee for the Azure Cloud service, but this will be minimal in comparison to the cost associated with maintaining a File Share Witness.

Scenario 2 – #SANLess Hyper-V Cluster at Remote Office/Branch Office (ROBO)

Here is the scenario. You run a fast food chain, department store chain, drug store chain, etc. You have the need to run a handful of servers to support your local operations at each of your store fronts. You decide that running these servers as virtual machines in Hyper-V are the way you want to go. Having these servers highly available is very important, so you decide it would be best to implement a two node cluster at each location. To minimize costs and to make management easy, you decide to purchase an identical pair of servers for each location and use the locally attached storage to build a #SANLess cluster with DataKeeper Cluster Edition. You come to realize that because you went #SANLess you don’t have access to a disk witness. And also, because you didn’t plan on purchasing a 3rd server for each location, a file share witness is also out of the question. You are in a real conundrum…a 2 node cluster NEEDS A WITNESS!

Here is where the Cloud Witness in Windows Azure comes and saves the day. Assuming your servers have access to the internet, a simple Cloud Witness can be configured and now you can support a 2-node #SANLess Hyper-V Cluster in each location. I would configure a non-clustered DC VM on each physical server and then create as many highly available VMs as a need in the cluster just using local attached storage.

Cloud Witness is a great new option in Windows Server 10. The only thing that would make it better is if they back ported it to Windows Server 2012 R2 so I could use it today!

Windows Server 10 Storage Replica Configuration and First Impressions #Windows10

Posted in High Availability, Windows 10, WSFC by daveberm on October 4, 2014

One of the most exciting new features in Windows Server 10 announced by Microsoft is Storage Replicas. It is described by Microsoft here: http://technet.microsoft.com/en-us/library/dn765475.aspx#BKMK_SR

“Storage Replica (SR) is a new feature that enables storage-agnostic, block-level, synchronous replication between servers for disaster recovery, as well as stretching of a failover cluster for high availability. Synchronous replication enables mirroring of data in physical sites with crash-consistent volumes ensuring zero data loss at the file system level. Asynchronous replication allows site extension beyond metropolitan ranges with the possibility of data loss.

What value does this change add?

Storage Replication enables you to do the following:

Provide an all-Microsoft disaster recovery solution for planned and unplanned outages of mission-critical workloads.

Use SMB3 transport with proven reliability, scalability, and performance.

Stretch clusters to metropolitan distances.

Use Microsoft software end to end for storage and clustering, such as Hyper-V, Storage Replica, Storage Spaces, Cluster, Scale-Out File Server, SMB3, Deduplication, and ReFS/NTFS.

Help reduce cost and complexity as follows:

Hardware agnostic, with no requirement to immediately abandon legacy storage such as SANs.

Allows commodity storage and networking technologies.

Features ease of graphical management for individual nodes and clusters through Failover Cluster Manager and Microsoft Azure Site Recovery.

Includes comprehensive, large-scale scripting options through Windows PowerShell.

Helps reduce downtime, and increase reliability and productivity intrinsic to Windows.

Provide supportability, performance metrics, and diagnostic capabilities.”

They mention a lot of use cases “… Hyper-V, Storage Replica, Storage Spaces, Cluster, Scale-Out File Server, SMB3, Deduplication, and ReFS/NTFS”. I’m not even sure what they mean by listing technologies such as ReFS/NTFS, Deduplication, SMB3, Storage Replica, Storage Spaces. These seem more like features rather than use cases, which I’m going to assume they are.

But let’s look at some of the other use cases they mentioned: Hyper-V, Cluster, Scale-out-File Server. I can easily imagine how Storage Replica is going to enhance these use cases by enabling shared nothing Scale-Out-File Servers and multisite clusters, including Hyper-V, SQL Server, File Servers, etc. In some cases it can also enable SANLess local area network clusters, allowing clusters to be built without requiring a shared Physical Disk resource.

In my first look at this solution I decided to focus on what I know and love, failover clusters. To keep things easy I decided I was going to focus on building a simple two node traditional file server (not scale out file server). I decided I was going to start with three fresh VMs in an entirely pure Windows Server 10 domain. It was easy enough to download the ISO’s and the install onto my 3 VMs went surpringly fast. Promoting a DC was a pretty similar experience to 2012 R2, though I think it was made a little more obvious that you have to actually run the DCPromo after the AD feature was installed.

I got my domain installed and my basic two node cluster with no resources built without a problem. I used VMware Fusion as my Hypervisor since it supports nested Hypervisors (a feature sorely lacking in Hyper-V for testing and demo by the way). I added a few additional VMDK files to each VM in my cluster and formatted them as E: and F: on each VM, figuring these would be my replica volumes. I had not defined resources yet and the cluster had no shared storage. Perfect, ready to start configuring Storage Replica!

So I fire up the Failover Cluster Manager and start poking around to see how I could start the replication process. There was absolutely nothing in the UI that I could find that said, Replica, Replication or anything even close to that. Because the documentation hadn’t shipped and the bits had only become available a few hours ago I was on my own to figure it out, despite my desperate Twitter searches for a how to blog. No problem I said, I’m a cluster MVP and my specialty is replication and multisite clusters so I’ll figure this out.

After a little searching I find that there is a new feature called Windows Volume Replication.


Great, so I enable that on both nodes thinking this is going to be great, but still nothing is jumping out at me in the Windows Failover Cluster UI that says “Configure Replica”. Scratching my head some more and trying to reach out to a few smart people I still had no clue. Then it dawned on me…”maybe it only supports Cluster Disk?” Now the feature announcement says “supports commodity storage”. To me that means any old hard drive in my PC or in this case the attached virtual disk on my VMs. As it turns out, I was correct; the disk has to appear in the cluster as a Physical Disk Resource in Available Storage.

OK, not the greatest requirement, but I continued plugging away. To get some disks that could be added as Physical Disk Resources attached to my VMs I enabled the iSCSI target role on my DC and create two iSCSI Virtual Disks for each of my VMs. Now remember, this is not like a regular cluster so each of these virtual disks were only assigned to one VM, they were not shared.

One each VM I used the iSCSI initiator to connect to these disks, initialized, onlined and formatted them. I then used Failover Cluster Manager to add them to the cluster.

Finally, I see some new options for replication.

I still struggled for a while to get the Replication enable button to even become selectable.

Here are the IMPORT things you need to know to get this show on the road:

  • The Disk must be Physical Dis Resources in the cluster. This means they must support SCSI3 reservations and must pass cluster validation.
  • The Disk must be GPT, not MBR
  • Each Disk you want to replicate must have an associated Disk to be used for the “Log File”. I assume this is where they queue data when replication is interrupted or in asynchronous mirrors where the data can be slightly behind
  • You must add the disk (just the data disk, not the log disk) to a cluster resource BEFORE your can enable replication. You cannot enable replication on a disk that is sitting in Available Storage
  • Your Source and Target Servers must have the same size disks and volume letters

Once you do that you will finally be able to enable Replication.

Like I said, you will need to choose a source log disk that needs to be in available storage. Microsoft recommends a SSD disks. I don’t know how big it should be. I assume the bigger it is the longer replication can be interrupted before you consume all the space and break your mirror.

Next step is to choose the Disk on your target server. If you get a message like “No Storage Available” you probably need to move “Available Storage” so that the target disk is Online on the Secondary server.

Make not that in the Technical Preview the Move Available Storage seems to be broken if you choose “Select Node”. However, if you choose “Best Possible Node” things seem to work and Available Storage will come online on the SECONDARY server.

Now all Available Storage should be online on the SECONDARY server.

 

And a disk for the target’s log file

This looks like a nice feature, especially for WAN replication. Apparently you can seed to destination disk, avoiding a full sync over the WAN.

The next screen just confirms everything…

When all is said and done, your cluster should look like this. You’ll probably notice that the Replication status displays “Unknown”. I’m assuming that is a bug that will be addressed later.

The other bug that I noticed is that the File Share creation wizard that is available via the Failover Cluster Manager doesn’t seem to work. It just closes unexpectedly after you launch it. However, you can create shares on the active node using File Manager and it will automatically be added to the cluster.

Some basic testing seems to indicate that it works fine. Just be careful that you know which of your volumes are the replicated data volumes and which ones are the log volumes. Data written to the log files is not replicated, so if you make a mistake (like I did) you may think replication is not working.

And finally, after all this trial and error I come to find that Microsoft has started to post at least a few pointers on how to make this work. Check out the requirements in this post from Ned Pyle, Storage Replica PM. http://social.technet.microsoft.com/Forums/windowsserver/en-US/f843291f-6dd8-4a78-be17-ef92262c158d/getting-started-with-windows-volume-replication?forum=WinServerPreview&prof=required

My thoughts…

I reserve my thoughts until I have some more time to play with this feature…

 

 

 

Static IP in Azure now available

Posted in Azure, Cloud by daveberm on June 19, 2014

You now can reserve a static public IP for your cloud service in Azure. Previously, if you stopped all your VMs in your cloud service your static IP would be released and you would be issued a new one the next time you started your VMs. That meant if you wanted your demos to work properly without a bunch of rework each time, you had to keep at least one VM running in each cloud service all the time.

And even better news, the 1st five static IP addresses you reserve are FREE. Now I can turn off all of my VMs and sleep easy at night knowing that my addresses won’t change, breaking my SQL Server Failover Cluster demo. Also, I can be pretty sure that now I won’t exceed my $200 MSDN Azure credit, which is always a good thing.

http://msdn.microsoft.com/en-us/library/azure/dn690120.aspx

Understanding the Windows Server Failover Cluster Quorum in Windows Server 2012 R2

Posted in High Availability, SQL, WSFC by daveberm on April 29, 2014

Understanding the Windows Server Failover Cluster Quorum in Windows Server 2012 R2

Before we get started with all the great new cluster quorum features in Windows Server 2012 R2, we should take a moment and understand what the quorum does and how we got to where we are today. Rob Hindman describes quorum best in his blog post

“The quorum configuration in a failover cluster determines the number of failures that the cluster can sustain while still remaining online.”

Prior to Windows Server 2003, there was only one quorum type, Disk Only. This quorum type is still available today, but is not recommended as the quorum disk is a single point of failure. In Windows Server 2003 Microsoft introduce the Majority Node Set (MNS) quorum. This was an improvement as it eliminated the disk only quorum as a single point of failure in the cluster. However, it did have its limitations. As implied in its name, Majority Node Set must have a majority of nodes to form a quorum and stay online, so this quorum model is not ideal for a two node cluster where the failure of one node would only leave one node remaining. One out of two is not a majority, so the remaining node would go offline.

Microsoft introduced a hotfix that allowed for the creation of a File Share Witness (FSW) on Windows Server 2003 SP1 and 2003 R2 clusters. Essentially the FSW is a simple file share on another server that is given a vote in a MNS cluster. The driving force behind this innovation was Exchange Server 2007 Continuous Cluster Replication (CCR), which allowed for clustering without shared storage. Of course without shared storage a Disk Only Quorum was not an option and effective MNS clusters would require three or more cluster nodes, hence, the introduction of the FSW to support two node Exchange CCR clusters.

Windows Server 2008 saw the introduction of a new witness type, Disk Witness. Unlike the old Disk Only quorum type, the Disk Witness allows the users to configure a small partition on a shared disk that acts as a vote in the cluster, similar to that of the FSW. However, the Disk Witness is preferable to the FSW because it keeps a copy of the cluster database and eliminates the possibility of “partition in time”. If you’d like to read more about partition in time, I suggest you read the File Share Witness vs. Disk Witness for local clusters.

Windows Server 2012 continued to improve upon quorum options. It is my belief that many of these new features were driven by two forces: Hyper-V and SQL Server AlwaysOn Availability Groups. With Hyper-V we began to see clusters that contained many more nodes than we have typically seen in the past. In a majority node set, as soon as you lose a majority of your votes, the remaining nodes go offline. So for example, if you have a Hyper-V cluster with seven nodes, if you were to lose four of those nodes the remaining nodes would go offline, even though there are three nodes remaining. This might not be exactly what you want to happen. So in Windows Server 2012, Microsoft introduced Dynamic Quorum.

Dynamic Quorum does what its name implies, it adjust the quorum dynamically. So in the scenario described about, assuming I didn’t lose all four servers at the same time, as servers in the cluster went offline, the number of votes in the quorum would adjust dynamically. When node one went offline, I would then in theory have a six node cluster. When node two went offline, I would then have a five node cluster, and so on. In reality, if I continued to lose cluster nodes one by one, I could go all the way down to a two node cluster and still remain online. And, if I had configured a witness (Disk or File Share) I could actually go all the way down to a single node and still remain online.

Read more at….

http://blogs.msdn.com/b/microsoft_press/archive/2014/04/28/from-the-mvps-understanding-the-windows-server-failover-cluster-quorum-in-windows-server-2012-r2.aspx

Follow

Get every new post delivered to your Inbox.

Join 914 other followers