Cluster Quorum File Share Witness on a USB stick?

I’m very excited to hear that coming in Windows Server 2019 there will be a few new features in regards to the File Share Witness for the Failover Cluster Quorum. The feature that many of my customers have been asking for about for many years is finally arriving…File Share Witness on a USB stick!

Okay, they didn’t really ask for that specifically, but many of my customers wanted to deploy a simple 2-node cluster in each store location, branch office, etc., and they didn’t want the added expense of a SAN to leverage a Disk Witness and weren’t to keen, or just didn’t have the connectivity, to rely on a Cloud Witness in Azure. Many of these customers just decided to forgo clustering, or they used an alternative clustering solution like the SIOS Protection Suite.

Now they have a viable alternative coming in Windows Server 2019. By leveraging a supported router, a USB disk inserted into the router can be configured with a file share that can be used as the witness. This eliminates the need for a 3rd server or internet connectivity.

https://blogs.msdn.microsoft.com/clustering/2018/04/16/new-file-share-witness-feature-in-windows-server-2019/

There are a few scenarios I can imagine, from HCI for Hyper-V, to a simple file server cluster using DataKeeper. Regardless of the scenario, keep in mind unless you plan on building a workgroup cluster, you probably will want to run a VM on each server to act as a redundant Domain Controllers, unless you have a reliable WAN connection back to a Domain Controller hosted in your main datacenter.

 

 

Cluster Quorum File Share Witness on a USB stick?

Can I put my File Share Witness on a DFS share?

I get asked this question all the time. People are concerned about losing their file share witness, so like many of their other shares, they want to leverage DFS for some additional availability. This is a very bad idea and is not supported.

Microsoft recently publish a great blog article that describes exactly why this is not supported.

https://blogs.msdn.microsoft.com/clustering/2018/04/13/failover-cluster-file-share-witness-and-dfs/

Much of this article would also apply to people who ask if they can use a DataKeeper replicated volume resource as a Disk Share. It makes sense, you can use a DataKeeper volume resource in place of a Physical Disk resource for any other workload, so why not a Disk Witness?

This issue is the same as the DFS issue, in the event of a loss of communication between the two servers there is nothing to guarantee that the volume wouldn’t come online on both servers, causing a potential split-brain condition. The Physical Disk resource overcomes this issue by using SCSI reservations, ensuring the disk is only accessible by one cluster node at a time.

The good news is that Microsoft already blocks you from trying to us a replicated DataKeeper Volume resource and coming in Windows Server 2019 it looks like they will also block you from using a DFS share as a File Share Witness.

Taken from the Failover Clustering and Network Load Balancing Team Blog Post “Failover Cluster File Share Witness and DFS

 

Can I put my File Share Witness on a DFS share?

8th MVP Award

Really glad to hear today that I’ve been re-awarded the Microsoft Cloud and Datacenter Management MVP award for 2018. It’s a great honor to be counted among some of the smartest people I know. Looking forward to the launch of Windows 2019 and whatever else Microsoft have up their sleeves for Azure in 2019.

8th MVP Award

What is the network speed between Azure regions connected with Virtual Network Peering?

***Updated July 5th***

This is the question I asked myself today and of course I couldn’t find this documented anywhere. I’m assuming there is no guarantee and it probably depends on current utilization, etc. If I’m wrong, someone please point me to the documentation that states the available speed. I primarily looked here and here.

So I set up two Windows 2016 D4s v3 instances, one in Central US and one in East US 2, which are paired regions.

If you don’t know what peering is, it essentially lets you to easily connect two different Azure virtual networks. Peering is very easy to setup, just make sure you configure it from both Virtual Networks, I made that mistake at first. Once it is configured properly it will look something like this.

2018-06-29_17-15-14
A properly functioning peered network in Azure

I then downloaded iPerf3 on each of the servers and began my testing. At first I had some pretty disappointing results.

But then upon doing some research, I found that running multiple threads and increasing the window size reports a more accurate measurement of the available bandwidth. I tried a few different setting and seemed to max at at just about 1.9 Gbps on average, much better than 45 Mbps!

The client parameters I used to produce the best results are as follows:

iperf3.exe -c 10.0.3.4 -w32M -P 4 -t 30

A sample of that output looks something like this.

- - - - - - - - - - - - - - - - - - - - - - - - -
 [ 4] 2.00-3.00 sec 34.1 MBytes 286 Mbits/sec
 [ 6] 2.00-3.00 sec 39.2 MBytes 329 Mbits/sec
 [ 8] 2.00-3.00 sec 56.1 MBytes 471 Mbits/sec
 [ 10] 2.00-3.00 sec 73.2 MBytes 615 Mbits/sec
 [SUM] 2.00-3.00 sec 203 MBytes 1.70 Gbits/sec
 - - - - - - - - - - - - - - - - - - - - - - - - -
 [ 4] 3.00-4.00 sec 37.5 MBytes 315 Mbits/sec
 [ 6] 3.00-4.00 sec 19.9 MBytes 167 Mbits/sec
 [ 8] 3.00-4.00 sec 97.0 MBytes 814 Mbits/sec
 [ 10] 3.00-4.00 sec 96.8 MBytes 812 Mbits/sec
 [SUM] 3.00-4.00 sec 251 MBytes 2.11 Gbits/sec
 - - - - - - - - - - - - - - - - - - - - - - - - -
 [ 4] 4.00-5.00 sec 34.6 MBytes 290 Mbits/sec
 [ 6] 4.00-5.00 sec 24.6 MBytes 207 Mbits/sec
 [ 8] 4.00-5.00 sec 70.1 MBytes 588 Mbits/sec
 [ 10] 4.00-5.00 sec 97.8 MBytes 820 Mbits/sec
 [SUM] 4.00-5.00 sec 227 MBytes 1.91 Gbits/sec
 - - - - - - - - - - - - - - - - - - - - - - - - -
 [ 4] 5.00-6.00 sec 34.5 MBytes 289 Mbits/sec
 [ 6] 5.00-6.00 sec 31.9 MBytes 267 Mbits/sec
 [ 8] 5.00-6.00 sec 73.9 MBytes 620 Mbits/sec
 [ 10] 5.00-6.00 sec 86.4 MBytes 724 Mbits/sec
 [SUM] 5.00-6.00 sec 227 MBytes 1.90 Gbits/sec
 - - - - - - - - - - - - - - - - - - - - - - - - -
 [ 4] 6.00-7.00 sec 35.4 MBytes 297 Mbits/sec
 [ 6] 6.00-7.00 sec 32.1 MBytes 269 Mbits/sec
 [ 8] 6.00-7.00 sec 80.9 MBytes 678 Mbits/sec
 [ 10] 6.00-7.00 sec 78.5 MBytes 658 Mbits/sec
 [SUM] 6.00-7.00 sec 227 MBytes 1.90 Gbits/sec

I saw spikes as high as 2.5 Gbps and lows as low as 1.3 Gbps.

[Update1]

So I received some feedback from @jvallery that I must try out.

2018-07-05_13-00-14

First thing I did was bump up my existing instances to D64sv3 and used -P 64. I saw a significant increase

iperf3.exe -c 10.0.3.4 -w32M -P 64 -t 30

[SUM] 0.00-1.00 sec 2.55 GBytes 21.8 Gbits/sec

I then spun up some F72v2 instances as suggested and I saw even better results.

iperf3.exe -c 10.0.2.5 -w32M -P 72 -t 30

[SUM] 0.00-1.00 sec 2.86 GBytes 24.5 Gbits/sec

 

I’m not well versed enough in Linux to warrant me spending a bunch of extra money fumbling my way through configuring this on Linux, but suffice it to say there seems to be a reasonable amount of bandwidth available between Azure regions when using peered networks.

If someone wanted to repeat this test using Linux as @jvallery suggested I’ll be glad to post your results here!

[/endUpdate1]

Using these two peered networks I am helping a client address SQL Server disaster recovery using SIOS DataKeeper to asynchronously replicate SQL data between regions for disaster recovery.

2018-06-29_17-54-47
SIOS DataKeeper replicating data from Azure EAST US 2 to CENTRAL US

In this particular scenario we were measuring a RPO measured in milliseconds. As you’ll see in the video below, during a DISKSPD test meant to simulate a typical SQL Server workload the RPO was <1 second.

 

 

I’d love to hear from you regarding your experience regarding any network speed you measure in Azure and how you are using peered networks in Azure.

 

 

 

 

What is the network speed between Azure regions connected with Virtual Network Peering?

High Availability Options for Microsoft SQL Server in the Google Cloud

I was recently interviewed by VMblog about high availability options for SQL Server. You can check out the interview here http://vmblog.com/

For the step by step guide I previously published, check it out here https://clusteringformeremortals.com/2018/01/10/how-to-build-a-sanless-sql-server-failover-cluster-instance-in-google-cloud-platform/

High Availability Options for Microsoft SQL Server in the Google Cloud

STORAGE SPACES DIRECT (S2D) FOR SQL SERVER FAILOVER CLUSTER INSTANCES (FCI)?

With the introduction of Windows Server 2016 Datacenter Edition a new feature called Storage Spaces Direct (S2D) was introduced. At a very high level, this solution allows you to pool together locally attached storage and present it to the cluster as a CSV for use in a Scale Out File Server, which can then be accessed over SMB 3 and used to hold cluster data such as Hyper-V VMDK files. This can also be configured in a hyper-converged (HCI) fashion such that the application and data can all run on the same set of servers.  This is a grossly over-simplified description, but for details, you will want to look here.

 

Storage Spaces Direct StackImage taken from https://docs.microsoft.com/en-us/windows-server/storage/storage-spaces/storage-spaces-direct-overview

The main use case targeted is hyper-converged infrastructure for Hyper-V deployments. However, there are other use cases, including leveraging this SMB storage to store SQL Server Data to be used in a SQL Server Failover Cluster Instance

Why would anyone want to do that? Well, for starters you can now build a highly available 2-node SQL Server Failover Cluster Instance (FCI) with SQL Server Standard Edition, without the need for shared storage. Previously, if you wanted HA without a SAN you pretty much were driven to buy SQL Server Enterprise Edition and make use of Always On Availability Groups or purchase SIOS DataKeeper and leverage the 3rd party solution which lets you build SANless clusters with any version of Windows or SQL Server. SQL Server Enterprise Edition can really drive up the cost of your project, especially if you were only buying it for the Availability Groups feature.

In addition to the cost associated with Availability Groups, there are a number of other technical reasons why you might prefer a Failover Cluster over an AG. Application compatibility, instance vs. database level protection, large number of databases, DTC support, trained staff, etc., are just some of the technical reasons why you may want to stick with a Failover Cluster Instance.

Microsoft lists both the SIOS DataKeeper solution and the S2D solution as two of the supported solutions for SQL Server FCI in their documentation here.

s2d

https://docs.microsoft.com/en-us/azure/virtual-machines/windows/sql/virtual-machines-windows-sql-high-availability-dr

When comparing the two solutions, you have to take into account that SIOS has been allowing you to build SANless Clusters since 1999, while the S2D solution is still in its infancy.  Having said that, there are bound to be some areas where S2D has some catching up to do, or simply features that they will never support simply due to the limitations with the technology.

Have a look at the following table for an overview of some of the things you should consider before you choose your SANless cluster solution.

S2D_vs_DKCE

If we go through this chart, we see that SIOS DataKeeper clearly has some significant advantages. For one, DataKeeper supports a much wider range of platforms, going all the way back to Windows Server 2008 R2 and SQL Server 2008 R2. The S2D solution only supports the latest releases of Windows and SQL Server 2016/2017. S2D also requires the  Datacenter Edition of Windows, which can add significantly to the cost of your deployment. In addition, SIOS delivers the ONLY HA/DR solution for SQL Server on Linux that works both on-prem and in the cloud.

But beyond the cost and platform limitations, I think the most glaring gap comes when we start to consider disaster recovery options for your SANless cluster. Allan Hirt, SQL Server Cluster guru and fellow Microsoft Cloud and Datacenter Management MVP, recently posted about this S2D limitation. In his article Revisiting Storage Spaces Direct and SQL Server FCIs  Allan points out that due to the lack of support for stretching S2D clusters across sites or including an S2D based cluster as a leg in an Always On Availability Group, the best option for DR in the S2D scenario is log shipping!

Don’t get me wrong, log shipping has been around forever and will probably be around long after I’m gone, but that is taking a HUGE step backwards when we think about all the disaster recovery solutions we have become accustomed to, like multi-site clusters, Availability Groups, etc.

In contrast, the SIOS DataKeeper solution fully supports Always On Availability Groups, and better yet – it can allow you to stretch your FCI across sites to give you the best HA/DR solution you could hope to achieve in terms of RTO/RPO. In an Azure environment, DataKeeper also support Azure Site Recovery (ASR), giving you even more options for disaster recovery.

The rest of this chart is pretty self explanatory. It basically consist of a list hardware, storage and networking requirements that must be met before you can deploy an S2D cluster. An exhaustive list of S2D requirements is maintained here.  https://docs.microsoft.com/en-us/windows-server/storage/storage-spaces/storage-spaces-direct-hardware-requirements

The SIOS DataKeeper solution is much more lenient. It supports any locally attached storage and as long as the hardware passes cluster validation, it is a supported cluster configuration. The block level replication solution has been working great ever since 1 Gbps was considered a fast LAN and a T1 WAN connection was considered a luxury.

SANless clustering is particularly interesting for cloud deployments. The cloud does not offer traditional shared storage options for clusters. So for users in the middle of a “lift and shift” to the cloud that want to take their clusters with them they must look at alternate storage solutions. For cloud deployments, SIOS is certified for AzureAWS and Google and available in the relevant cloud marketplace. While there doesn’t appear to be anything blocking deployment of S2D based clusters in Azure or Google, there is a conspicuous lack of documentation or supportability statements from Microsoft for those platforms.

SIOS DataKeeper has been doing this since 1999. SIOS has heard all the feature requests, uncovered all the bugs, and has a rock solid solution for SANless clusters that is time tested and proven. While Microsoft S2D is a promising technology, as a 1st generation product I would wait until the dust settles and some of the feature gap closes before I would consider it for my business critical applications.

STORAGE SPACES DIRECT (S2D) FOR SQL SERVER FAILOVER CLUSTER INSTANCES (FCI)?

How to Build a SANless SQL Server Failover Cluster Instance in Google Cloud Platform

If you are going to host SQL Server on the Google Cloud Platform (GCP) you will want to make sure it is highly available. One of the best and most economical ways to do that is to build a SQL Server Failover Cluster Instance (FCI). Since SQL Server Standard Edition supports Failover Clustering, we can avoid the cost associated with SQL Server Enterprise Edition which is required for Always On Availability Groups. In addition, SQL Server Failover Clustering is a much more robust solution as it protects the entire instance of SQL Server, has no limitations in terms of DTC (Distributed Transaction Coordinator) support and is easier to manage. Plus, it supports earlier versions of SQL Server that you may still have, such as SQL 2012 through the latest SQL 2017. Unfortunately, SQL 2008 R2 is not supported due to the lack of support for cross-subnet failover.

Traditionally, SQL Server FCI requires that you have a SAN or some type of shared storage device. In the cloud, there is no cluster-aware shared storage. In place of a SAN, we will build a SANless cluster using SIOS DataKeeper Cluster Edition (DKCE). DKCE uses block-level replication to ensure that the locally attached storage on each instance remains in sync with one other. It also integrates with Windows Server Failover Clustering through its own storage class resource called a DataKeeper Volume which takes the place of the physical disk resource. As far as the cluster is concerned the SIOS DataKeeper volume looks like a physical disk, but instead of controlling SCSI reservations, it controls the mirror direction, ensuring that only the active server writes to the disk and that the passive server(s) receive all the changes either synchronously or asynchronously.

In this guide, we will walk through the steps to build a two-node failover cluster between two instances in the same region, but in different Zones, within the GCP as shown in Figure 1.

Google Cloud Diagram

Download the entire white paper at https://us.sios.com/san-sanless-clusters-resources/white-paper-build-sql-server-failover-cluster-gcp/

How to Build a SANless SQL Server Failover Cluster Instance in Google Cloud Platform