I answer this question often enough that I thought I should probably but a link to it in my blog.
This article tells you everything you need to know. However, what you may not realize is that by following the instructions in the article you are minimizing the amount of planned downtime while also giving yourself the opportunity to “test” the update on one node before your upgrade both nodes. If the upgrade does not go well on the first node, at least the application is still running on the second node until you can figure out what went wrong.
This is just one of the side benefits that you get when you cluster at the application layer vs. clustering at the hypervisor layer. If this were simply a VM in an availability group, you would have to schedule downtime to complete the application upgrade and hope that it all went well as the only failback is to restore the VM from backup. As I discussed in earlier articles, there is a benefit to clustering at the hypervisor level, but you have to understand what you are giving up as well.
I recently returned from a 10 day trip to Germany where I attended CeBIT and also presented at TechDays in Hannover and Essen with Microsoft Technical Evangelists Michael Korp and Ralf Schnell . The trip was very productive and the sessions were very well attended. My portion of the session focused on Advanced Availability for Hyper-V, specifically multi-site clusters, data replication and automated disaster recovery. Have a look at the video here.
Every time I read a blog post, or open a magazine article about virtualization and disaster recovery I see the same thing….VMware has a more robust DR solution than Microsoft. Well, I’d like to challenge that assumption. From the view where I sit, this is actually one of the areas where Microsoft has a major competitive advantage at the moment. Here is how I see it.
VMware Site Recovery Manager
This is an optional additional add on that rides on the back of Array based replication solutions. While the recovery point objective is good due to the array based replication, the RTO is measured in hours, not minutes. Add in the fact that moving back to the primary data center is a very manual procedure which basically requires that you re-create your jobs in the opposite direction; the complete end to end recovery operation of failover and failback could take the better part of a day or longer.
Microsoft Multi-Site Cluster
Virtual machine HA clustering is included with the free version of Hyper-V Server 2008 R2, as well as with Windows Server 2008 Enterprise and Datacenter editions. In order to do multi-site clusters, it requires array based replication or host based replication solutions that integrate with Windows Server Failover Clustering. With a multi-site cluster, failover is measured in minutes (just about the time it takes to start a VM) and can be used with array based replication solutions such as EMC SRDF CE or HP MSA CLX or the much less expensive host based replication solutions such as SteelEye DataKeeper Cluster Edition.
Not only is failover quick with Hyper-V multi-site clusters, measured in just a few minutes, failback is also quick and seamless as well. Add in support for Live Migrations or Quick Migration across Data Centers, I think this is one area that Microsoft actually has a much more robust solution than VMware. Maybe it does not included automated DR tests, but when you consider you can failover and failback all in under 10 minutes, maybe an actual DR test performed monthly would give you a much better indication of what to expect in an actual disaster?
If you want a Hyper-V solution more like SRM, then there is an option there as well, it is called Citrix Essential for Hyper-V. But much like SRM, it is an optional add-on feature and really doesn’t even match the RPO and RTO features that you can achieve with basic multi-site clusters for Hyper-V.
What do you think? Am I wrong or is there something I just don’t get? From my view, Hyper-V is heads and shoulders above vSphere in terms of disaster recovery features.
I was recently asked whether MSCS/WSFC will become obsolete due to 3rd party HA solutions. I think there will always be a market for 3rd party HA solutions, but many of the enhancements delivered with Windows Server 2008 have reduced the need to explore alternate HA solutions. I think the greater threat to MSCS/WSFC is HA solutions provided by the virtualization vendors, such as Microsoft’s Hyper-V failover clusters (which actually uses WSFC) and VMware HA. These solutions provided by the virtualization platform provide protection in case of host failure, although they currently do not have visibility into the application that is running within the VM.
The real question is what kind of failure do you want to protect against? If physical server failure is your primary concern, then in some cases where MSCS may have previously been deployed, you will see Hyper-V Clusters or VMware HA being deployed instead. In other cases where MSCS/WSFC may have seemed like overkill or was incompatible with the OS or application, you will instead see clustered VMs being deployed because it is easy to install and it supports all applications and operating systems. The mere fact that more workloads will be running per physical server will make it imperative to have some kind of clustering solution so that the failure of a single server does not bring down your entire infrastructure. In many cases, this clustering solution will be provided by the virtualization vendor.
Hyper-V Clusters and VMware HA are easy to implement and have a broad range of support as the protected VM can be running any OS or application. The tradeoff is that you lose the application level monitoring included with MSCS/WSFC. There will always be a class of applications that need application awareness, so MSCS/WSFC or other HA solutions that manage application availability will always be needed to ensure that the application is available, not just the server itself. With that being said, MSCS/WSFC will not become obsolete, but you will see it deployed alongside other cluster solutions provided by the hypervisor vendors..