If you are at Tech-Ed in New Orleans this week make sure you stop by the Windows Server Failover Cluster booth in the Technology Learning Center and have a look at the multi-site Hyper-V cluster demo using SteelEye DataKeeper Cluster Edition as the replication engine. I’ll also be in the booth to answer any questions you may have. SteelEye also has a booth at the show if you would like to discuss becoming a partner or customer!
I answer this question often enough that I thought I should probably but a link to it in my blog.
This article tells you everything you need to know. However, what you may not realize is that by following the instructions in the article you are minimizing the amount of planned downtime while also giving yourself the opportunity to “test” the update on one node before your upgrade both nodes. If the upgrade does not go well on the first node, at least the application is still running on the second node until you can figure out what went wrong.
This is just one of the side benefits that you get when you cluster at the application layer vs. clustering at the hypervisor layer. If this were simply a VM in an availability group, you would have to schedule downtime to complete the application upgrade and hope that it all went well as the only failback is to restore the VM from backup. As I discussed in earlier articles, there is a benefit to clustering at the hypervisor level, but you have to understand what you are giving up as well.
I recently returned from a 10 day trip to Germany where I attended CeBIT and also presented at TechDays in Hannover and Essen with Microsoft Technical Evangelists Michael Korp and Ralf Schnell . The trip was very productive and the sessions were very well attended. My portion of the session focused on Advanced Availability for Hyper-V, specifically multi-site clusters, data replication and automated disaster recovery. Have a look at the video here.
Every time I read a blog post, or open a magazine article about virtualization and disaster recovery I see the same thing….VMware has a more robust DR solution than Microsoft. Well, I’d like to challenge that assumption. From the view where I sit, this is actually one of the areas where Microsoft has a major competitive advantage at the moment. Here is how I see it.
VMware Site Recovery Manager
This is an optional additional add on that rides on the back of Array based replication solutions. While the recovery point objective is good due to the array based replication, the RTO is measured in hours, not minutes. Add in the fact that moving back to the primary data center is a very manual procedure which basically requires that you re-create your jobs in the opposite direction; the complete end to end recovery operation of failover and failback could take the better part of a day or longer.
Microsoft Multi-Site Cluster
Virtual machine HA clustering is included with the free version of Hyper-V Server 2008 R2, as well as with Windows Server 2008 Enterprise and Datacenter editions. In order to do multi-site clusters, it requires array based replication or host based replication solutions that integrate with Windows Server Failover Clustering. With a multi-site cluster, failover is measured in minutes (just about the time it takes to start a VM) and can be used with array based replication solutions such as EMC SRDF CE or HP MSA CLX or the much less expensive host based replication solutions such as SteelEye DataKeeper Cluster Edition.
Not only is failover quick with Hyper-V multi-site clusters, measured in just a few minutes, failback is also quick and seamless as well. Add in support for Live Migrations or Quick Migration across Data Centers, I think this is one area that Microsoft actually has a much more robust solution than VMware. Maybe it does not included automated DR tests, but when you consider you can failover and failback all in under 10 minutes, maybe an actual DR test performed monthly would give you a much better indication of what to expect in an actual disaster?
If you want a Hyper-V solution more like SRM, then there is an option there as well, it is called Citrix Essential for Hyper-V. But much like SRM, it is an optional add-on feature and really doesn’t even match the RPO and RTO features that you can achieve with basic multi-site clusters for Hyper-V.
What do you think? Am I wrong or is there something I just don’t get? From my view, Hyper-V is heads and shoulders above vSphere in terms of disaster recovery features.
I was recently asked whether MSCS/WSFC will become obsolete due to 3rd party HA solutions. I think there will always be a market for 3rd party HA solutions, but many of the enhancements delivered with Windows Server 2008 have reduced the need to explore alternate HA solutions. I think the greater threat to MSCS/WSFC is HA solutions provided by the virtualization vendors, such as Microsoft’s Hyper-V failover clusters (which actually uses WSFC) and VMware HA. These solutions provided by the virtualization platform provide protection in case of host failure, although they currently do not have visibility into the application that is running within the VM.
The real question is what kind of failure do you want to protect against? If physical server failure is your primary concern, then in some cases where MSCS may have previously been deployed, you will see Hyper-V Clusters or VMware HA being deployed instead. In other cases where MSCS/WSFC may have seemed like overkill or was incompatible with the OS or application, you will instead see clustered VMs being deployed because it is easy to install and it supports all applications and operating systems. The mere fact that more workloads will be running per physical server will make it imperative to have some kind of clustering solution so that the failure of a single server does not bring down your entire infrastructure. In many cases, this clustering solution will be provided by the virtualization vendor.
Hyper-V Clusters and VMware HA are easy to implement and have a broad range of support as the protected VM can be running any OS or application. The tradeoff is that you lose the application level monitoring included with MSCS/WSFC. There will always be a class of applications that need application awareness, so MSCS/WSFC or other HA solutions that manage application availability will always be needed to ensure that the application is available, not just the server itself. With that being said, MSCS/WSFC will not become obsolete, but you will see it deployed alongside other cluster solutions provided by the hypervisor vendors..
If you have a print server failover cluster on Windows Server 2008 R2, Microsoft recommends you install this update immediately.
Read this great Blog post from Symon Perriman, Program Manager for Microsoft’s Clustering and High Availability Team for more details.
It is official, I passed exam 70-652 today and I am now a MCTS: Windows Server Virtualization, Configuration. It was 11 years ago that I sat for my first NT 4 exam and now about a dozen exams later I am just now embarking on updating my credentials to the latest and greatest, once again. I think certifications are a good thing, but certainly don’t replace real world experience and good Google skills when it comes to diagnosing a problem or planning a new project. I’ll keep you posted on my progress; hopefully I’ll be able to complete MCITP: Enterprise Administrator before my kids get out of school in June so I can enjoy the summer.
Microsoft has recently updated their Virtualization Continuity page with some good information…
Cross-Site Disaster Recovery Solutions
Implementing a reliable, rapid-recovery strategy can be time-consuming to implement and expensive to manage. Because of the complexity and cost, many companies simply don’t have comprehensive business continuity plans to protect their data and ensure application availability.
Virtualization has been a game changer for many companies. With virtualization based Site Recovery solutions, you can ensure higher availability and business continuity options. Windows Server provides support for a wide range of industry leading, shared storage solutions to deliver Quick and Live Migration. Combined with partner cross-site data management and replication technologies, Microsoft is offering complete Site Recovery solutions.
In summary, Microsoft Site Recovery solutions provide these key benefits:
- Bullet proof application and data availability across a range of applications
- Site-wide disaster recovery that can help you gain immediate and long-term operational and capital benefits
- Automated fail-over and fail back based on clustering and data resynchronization delivering superior application and data availability, for planned and unplanned downtime
Also, they have recently published a white paper entitled “Microsoft End-to-End Cross-Site Disaster Recovery Solutions“. This is a must read for anyone deploying SteelEye DataKeeper in a Cross-Site Disaster Recovery configuration.
Look for a Step-by-Step article on how to configure a DHCP across data centers and/or without shared storage in the very near future using Windows Server Failover Clustering and SteelEye DataKeeper Cluster Edition. In the meantime, check out this video that demonstrates a DHCP cluster that uses a replicated DHCP database instead of a shared disk in the cluster.
SteelEye DataKeeper Cluster Edition wins Windows IT Pro Best High Availability/Disaster Recovery awards
I am pleased to announce that Windows IT Pro has awarded SteelEye DataKeeper Cluster Edition the Best High Availability and Disaster Recovery Product in two categories; Community Choice Gold Award and Editors’ Best Silver Award.
I am really proud to be a part of the SteelEye DataKeeper team and I appreciate all of the Windows IT Pro community that voted for us in the Community Choice award!