8th MVP Award

Really glad to hear today that I’ve been re-awarded the Microsoft Cloud and Datacenter Management MVP award for 2018. It’s a great honor to be counted among some of the smartest people I know. Looking forward to the launch of Windows 2019 and whatever else Microsoft have up their sleeves for Azure in 2019.

8th MVP Award

What is the network speed between Azure regions connected with Virtual Network Peering?

***Updated July 5th***

This is the question I asked myself today and of course I couldn’t find this documented anywhere. I’m assuming there is no guarantee and it probably depends on current utilization, etc. If I’m wrong, someone please point me to the documentation that states the available speed. I primarily looked here and here.

So I set up two Windows 2016 D4s v3 instances, one in Central US and one in East US 2, which are paired regions.

If you don’t know what peering is, it essentially lets you to easily connect two different Azure virtual networks. Peering is very easy to setup, just make sure you configure it from both Virtual Networks, I made that mistake at first. Once it is configured properly it will look something like this.

2018-06-29_17-15-14
A properly functioning peered network in Azure

I then downloaded iPerf3 on each of the servers and began my testing. At first I had some pretty disappointing results.

But then upon doing some research, I found that running multiple threads and increasing the window size reports a more accurate measurement of the available bandwidth. I tried a few different setting and seemed to max at at just about 1.9 Gbps on average, much better than 45 Mbps!

The client parameters I used to produce the best results are as follows:

iperf3.exe -c 10.0.3.4 -w32M -P 4 -t 30

A sample of that output looks something like this.

- - - - - - - - - - - - - - - - - - - - - - - - -
 [ 4] 2.00-3.00 sec 34.1 MBytes 286 Mbits/sec
 [ 6] 2.00-3.00 sec 39.2 MBytes 329 Mbits/sec
 [ 8] 2.00-3.00 sec 56.1 MBytes 471 Mbits/sec
 [ 10] 2.00-3.00 sec 73.2 MBytes 615 Mbits/sec
 [SUM] 2.00-3.00 sec 203 MBytes 1.70 Gbits/sec
 - - - - - - - - - - - - - - - - - - - - - - - - -
 [ 4] 3.00-4.00 sec 37.5 MBytes 315 Mbits/sec
 [ 6] 3.00-4.00 sec 19.9 MBytes 167 Mbits/sec
 [ 8] 3.00-4.00 sec 97.0 MBytes 814 Mbits/sec
 [ 10] 3.00-4.00 sec 96.8 MBytes 812 Mbits/sec
 [SUM] 3.00-4.00 sec 251 MBytes 2.11 Gbits/sec
 - - - - - - - - - - - - - - - - - - - - - - - - -
 [ 4] 4.00-5.00 sec 34.6 MBytes 290 Mbits/sec
 [ 6] 4.00-5.00 sec 24.6 MBytes 207 Mbits/sec
 [ 8] 4.00-5.00 sec 70.1 MBytes 588 Mbits/sec
 [ 10] 4.00-5.00 sec 97.8 MBytes 820 Mbits/sec
 [SUM] 4.00-5.00 sec 227 MBytes 1.91 Gbits/sec
 - - - - - - - - - - - - - - - - - - - - - - - - -
 [ 4] 5.00-6.00 sec 34.5 MBytes 289 Mbits/sec
 [ 6] 5.00-6.00 sec 31.9 MBytes 267 Mbits/sec
 [ 8] 5.00-6.00 sec 73.9 MBytes 620 Mbits/sec
 [ 10] 5.00-6.00 sec 86.4 MBytes 724 Mbits/sec
 [SUM] 5.00-6.00 sec 227 MBytes 1.90 Gbits/sec
 - - - - - - - - - - - - - - - - - - - - - - - - -
 [ 4] 6.00-7.00 sec 35.4 MBytes 297 Mbits/sec
 [ 6] 6.00-7.00 sec 32.1 MBytes 269 Mbits/sec
 [ 8] 6.00-7.00 sec 80.9 MBytes 678 Mbits/sec
 [ 10] 6.00-7.00 sec 78.5 MBytes 658 Mbits/sec
 [SUM] 6.00-7.00 sec 227 MBytes 1.90 Gbits/sec

I saw spikes as high as 2.5 Gbps and lows as low as 1.3 Gbps.

[Update1]

So I received some feedback from @jvallery that I must try out.

2018-07-05_13-00-14

First thing I did was bump up my existing instances to D64sv3 and used -P 64. I saw a significant increase

iperf3.exe -c 10.0.3.4 -w32M -P 64 -t 30

[SUM] 0.00-1.00 sec 2.55 GBytes 21.8 Gbits/sec

I then spun up some F72v2 instances as suggested and I saw even better results.

iperf3.exe -c 10.0.2.5 -w32M -P 72 -t 30

[SUM] 0.00-1.00 sec 2.86 GBytes 24.5 Gbits/sec

 

I’m not well versed enough in Linux to warrant me spending a bunch of extra money fumbling my way through configuring this on Linux, but suffice it to say there seems to be a reasonable amount of bandwidth available between Azure regions when using peered networks.

If someone wanted to repeat this test using Linux as @jvallery suggested I’ll be glad to post your results here!

[/endUpdate1]

Using these two peered networks I am helping a client address SQL Server disaster recovery using SIOS DataKeeper to asynchronously replicate SQL data between regions for disaster recovery.

2018-06-29_17-54-47
SIOS DataKeeper replicating data from Azure EAST US 2 to CENTRAL US

In this particular scenario we were measuring a RPO measured in milliseconds. As you’ll see in the video below, during a DISKSPD test meant to simulate a typical SQL Server workload the RPO was <1 second.

 

 

I’d love to hear from you regarding your experience regarding any network speed you measure in Azure and how you are using peered networks in Azure.

 

 

 

 

What is the network speed between Azure regions connected with Virtual Network Peering?

High Availability Options for Microsoft SQL Server in the Google Cloud

I was recently interviewed by VMblog about high availability options for SQL Server. You can check out the interview here http://vmblog.com/

For the step by step guide I previously published, check it out here https://clusteringformeremortals.com/2018/01/10/how-to-build-a-sanless-sql-server-failover-cluster-instance-in-google-cloud-platform/

High Availability Options for Microsoft SQL Server in the Google Cloud

STORAGE SPACES DIRECT (S2D) FOR SQL SERVER FAILOVER CLUSTER INSTANCES (FCI)?

With the introduction of Windows Server 2016 Datacenter Edition a new feature called Storage Spaces Direct (S2D) was introduced. At a very high level, this solution allows you to pool together locally attached storage and present it to the cluster as a CSV for use in a Scale Out File Server, which can then be accessed over SMB 3 and used to hold cluster data such as Hyper-V VMDK files. This can also be configured in a hyper-converged (HCI) fashion such that the application and data can all run on the same set of servers.  This is a grossly over-simplified description, but for details, you will want to look here.

 

Storage Spaces Direct StackImage taken from https://docs.microsoft.com/en-us/windows-server/storage/storage-spaces/storage-spaces-direct-overview

The main use case targeted is hyper-converged infrastructure for Hyper-V deployments. However, there are other use cases, including leveraging this SMB storage to store SQL Server Data to be used in a SQL Server Failover Cluster Instance

Why would anyone want to do that? Well, for starters you can now build a highly available 2-node SQL Server Failover Cluster Instance (FCI) with SQL Server Standard Edition, without the need for shared storage. Previously, if you wanted HA without a SAN you pretty much were driven to buy SQL Server Enterprise Edition and make use of Always On Availability Groups or purchase SIOS DataKeeper and leverage the 3rd party solution which lets you build SANless clusters with any version of Windows or SQL Server. SQL Server Enterprise Edition can really drive up the cost of your project, especially if you were only buying it for the Availability Groups feature.

In addition to the cost associated with Availability Groups, there are a number of other technical reasons why you might prefer a Failover Cluster over an AG. Application compatibility, instance vs. database level protection, large number of databases, DTC support, trained staff, etc., are just some of the technical reasons why you may want to stick with a Failover Cluster Instance.

Microsoft lists both the SIOS DataKeeper solution and the S2D solution as two of the supported solutions for SQL Server FCI in their documentation here.

s2d

https://docs.microsoft.com/en-us/azure/virtual-machines/windows/sql/virtual-machines-windows-sql-high-availability-dr

When comparing the two solutions, you have to take into account that SIOS has been allowing you to build SANless Clusters since 1999, while the S2D solution is still in its infancy.  Having said that, there are bound to be some areas where S2D has some catching up to do, or simply features that they will never support simply due to the limitations with the technology.

Have a look at the following table for an overview of some of the things you should consider before you choose your SANless cluster solution.

S2D_vs_DKCE

If we go through this chart, we see that SIOS DataKeeper clearly has some significant advantages. For one, DataKeeper supports a much wider range of platforms, going all the way back to Windows Server 2008 R2 and SQL Server 2008 R2. The S2D solution only supports the latest releases of Windows and SQL Server 2016/2017. S2D also requires the  Datacenter Edition of Windows, which can add significantly to the cost of your deployment. In addition, SIOS delivers the ONLY HA/DR solution for SQL Server on Linux that works both on-prem and in the cloud.

But beyond the cost and platform limitations, I think the most glaring gap comes when we start to consider disaster recovery options for your SANless cluster. Allan Hirt, SQL Server Cluster guru and fellow Microsoft Cloud and Datacenter Management MVP, recently posted about this S2D limitation. In his article Revisiting Storage Spaces Direct and SQL Server FCIs  Allan points out that due to the lack of support for stretching S2D clusters across sites or including an S2D based cluster as a leg in an Always On Availability Group, the best option for DR in the S2D scenario is log shipping!

Don’t get me wrong, log shipping has been around forever and will probably be around long after I’m gone, but that is taking a HUGE step backwards when we think about all the disaster recovery solutions we have become accustomed to, like multi-site clusters, Availability Groups, etc.

In contrast, the SIOS DataKeeper solution fully supports Always On Availability Groups, and better yet – it can allow you to stretch your FCI across sites to give you the best HA/DR solution you could hope to achieve in terms of RTO/RPO. In an Azure environment, DataKeeper also support Azure Site Recovery (ASR), giving you even more options for disaster recovery.

The rest of this chart is pretty self explanatory. It basically consist of a list hardware, storage and networking requirements that must be met before you can deploy an S2D cluster. An exhaustive list of S2D requirements is maintained here.  https://docs.microsoft.com/en-us/windows-server/storage/storage-spaces/storage-spaces-direct-hardware-requirements

The SIOS DataKeeper solution is much more lenient. It supports any locally attached storage and as long as the hardware passes cluster validation, it is a supported cluster configuration. The block level replication solution has been working great ever since 1 Gbps was considered a fast LAN and a T1 WAN connection was considered a luxury.

SANless clustering is particularly interesting for cloud deployments. The cloud does not offer traditional shared storage options for clusters. So for users in the middle of a “lift and shift” to the cloud that want to take their clusters with them they must look at alternate storage solutions. For cloud deployments, SIOS is certified for AzureAWS and Google and available in the relevant cloud marketplace. While there doesn’t appear to be anything blocking deployment of S2D based clusters in Azure or Google, there is a conspicuous lack of documentation or supportability statements from Microsoft for those platforms.

SIOS DataKeeper has been doing this since 1999. SIOS has heard all the feature requests, uncovered all the bugs, and has a rock solid solution for SANless clusters that is time tested and proven. While Microsoft S2D is a promising technology, as a 1st generation product I would wait until the dust settles and some of the feature gap closes before I would consider it for my business critical applications.

STORAGE SPACES DIRECT (S2D) FOR SQL SERVER FAILOVER CLUSTER INSTANCES (FCI)?

How to Build a SANless SQL Server Failover Cluster Instance in Google Cloud Platform

If you are going to host SQL Server on the Google Cloud Platform (GCP) you will want to make sure it is highly available. One of the best and most economical ways to do that is to build a SQL Server Failover Cluster Instance (FCI). Since SQL Server Standard Edition supports Failover Clustering, we can avoid the cost associated with SQL Server Enterprise Edition which is required for Always On Availability Groups. In addition, SQL Server Failover Clustering is a much more robust solution as it protects the entire instance of SQL Server, has no limitations in terms of DTC (Distributed Transaction Coordinator) support and is easier to manage. Plus, it supports earlier versions of SQL Server that you may still have, such as SQL 2012 through the latest SQL 2017. Unfortunately, SQL 2008 R2 is not supported due to the lack of support for cross-subnet failover.

Traditionally, SQL Server FCI requires that you have a SAN or some type of shared storage device. In the cloud, there is no cluster-aware shared storage. In place of a SAN, we will build a SANless cluster using SIOS DataKeeper Cluster Edition (DKCE). DKCE uses block-level replication to ensure that the locally attached storage on each instance remains in sync with one other. It also integrates with Windows Server Failover Clustering through its own storage class resource called a DataKeeper Volume which takes the place of the physical disk resource. As far as the cluster is concerned the SIOS DataKeeper volume looks like a physical disk, but instead of controlling SCSI reservations, it controls the mirror direction, ensuring that only the active server writes to the disk and that the passive server(s) receive all the changes either synchronously or asynchronously.

In this guide, we will walk through the steps to build a two-node failover cluster between two instances in the same region, but in different Zones, within the GCP as shown in Figure 1.

Google Cloud Diagram

Download the entire white paper at https://us.sios.com/san-sanless-clusters-resources/white-paper-build-sql-server-failover-cluster-gcp/

How to Build a SANless SQL Server Failover Cluster Instance in Google Cloud Platform

Why I should convert my #Azure clusters to Managed Disks TODAY!

You may have heard about the recent storage outage that impacted some instances in the US East region back on March 16th. A root cause analysis of the outage is posted here.

March 16th US East Storage Outage

Customer impact: A subset of customers using Storage in the East US region may have experienced errors and timeouts while accessing their storage account in a single Storage scale unit

You might be asking, “What is a single Storage scale unit”. Well, you can think of it as a single storage cluster, or single SAN, or however you want to think about it. I don’t think Azure publishes their exact infrastructure, but you can probably assume that behind the scenes they are using Scale Out File Servers for backend storage.

So the question is, how could I have survived this outage with minimal downtime? If you read further down that root cause analysis you come across this little nugget.

Virtual Machines using Managed Disks in an Availability Set would have maintained availability during this incident.

What’s Managed Disks you ask? Well, just on February 8th Corey Sanders announced the GA of Managed Disks. You can read all about Managed Disks here. https://azure.microsoft.com/en-us/services/managed-disks/

The reason why Managed Disks would have helped in this outage is that by leveraging an Availability Set combined with Managed Disks you ensure that each of the instances in your Availability Set are connected to a different “Storage scale unit”. So in this particular case, only one of your cluster nodes would have failed, leaving the remaining nodes to take over the workload.

Prior to Managed Disks being available (anything deployed before 2/8/2016), there was no way to ensure that the storage attached to your servers resided on different Storage scale units. Sure, you could use different storage accounts for each instances, but in reality that did not guarantee that those Storage Accounts provisioned storage on different Storage scale units.

So while an Availability Set ensured that your instances reside in different Fault Domains and Update Domains to ensure the availability of the instance itself, the additional storage attached to each instance really represented a single point of failure. Although the storage itself is highly resilient, with three copies of your data and geo-redundant options available, in this case with a power failure the entire Storage scale unit went down along with all the servers attached to it.

So long story short…migrate to Managed Disk as soon as possible in order to help minimize downtime

https://docs.microsoft.com/en-us/azure/virtual-machines/virtual-machines-windows-migrate-to-managed-disks

And if you really want to minimize downtime you should consider Hybrid Cloud Deployments that span cloud providers or on-prem to cloud!

Why I should convert my #Azure clusters to Managed Disks TODAY!

New Azure ILB feature allows you to build a multi-instance SQL Server Failover Cluster in #Azure

At Microsoft Ignite this past September, Microsoft made some announcements around Azure. One of these announcements was the general availability of multiple VIPs on internal load balancers. Why is this so important to a SQL Server DBA? Well, up until now if you want to deploy highly available SQL Server in Azure you were limited to a single SQL Server FCI per cluster or a single Availability Group listener.

This limitation forced you to deploy a new cluster for each instance of SQL Server you wanted to protect in a Failover Cluster. It also forced you to group all of your databases into a single Availability Group if you wanted automatic failover and client redirection in your AlwaysOn AG configuration.

Those restrictions have now been lifted with these new ILB features. In this post I am going to walk you through the process of deploying a SQL Server FCI in Azure that contains two SQL Server instances. In a future post I will walk you through the same process for SQL Server AlwaysOn AG.

Let’s start with a basic, single instance SQL Server FCI in Azure as I describe in my post Deploying Microsoft SQL Server 2014 Failover Clusters in Azure Resource Manager .

That post describes the process of creating the cluster, using DataKeeper to create the replicated volume resources used in the cluster, creating the Internal Load Balancer (ILB) and then fixing the SQL Server Cluster IP Resource to work with the ILB. If you want to skip that process and jumpstart your configuration you can always use the Azure Deployment Template that creates a 2-Node SQL Server FCI using SIOS DataKeeper

Assuming you now have a basic two node SQL Server FCI, the steps to add a 2nd named instance are as follows:

  1. Create another DataKeeper Volume Resource on another volume that is not currently being used. You may need to add additional disks to your Azure instance if you have no available volumes. As part of this volume creation process the new DataKeeper Volume resource will be registered in Available Storage in the cluster. Refer to the article referenced earlier for the details.
  2. Install a named instance of SQL Server on the first node, specifying the DataKeeper Volume that we just created as the storage location.
  3. “Add a node” to the cluster on the second node.
  4. Lock down the port number of this new named instance to a port that is not in use. In my example I use port 1440.

Next we have to adjust the ILB to redirect traffic to this second instance. Here are the steps you need to follow:

Add a frontend IP address that is identical to the SQL cluster IP address you used for the second instance of SQL Server as shown below.

1_-_add_frontend_ip_address

Next, we will need to add another probe since the instances could be running on different servers. As shown below, I added a probe that probes port 59998 (instead of the usual 59999). We will need to make sure the new rules reference this proble. We will also need to remember that port number since we will need to update IP address associated with this instance during the last step of this process.

2_-_add_probe

Now we need to add two new rules to the ILB to direct traffic destined for this 2nd instance of SQL. Of course we need to add a rule to redirect TCP port 1440 (the port I used for the named instance of SQL), but because we are now using named instances we will also need to have a port to support the SQL Server Browser Service, UDP Port 1434.

In the picture below depicting the rule for the SQL Server Browser Service, take note that the Front End IP Address is referencing the new FrontendIP address (10.0.0.201), UDP port 1434 for both the Port and Backend Port. In the pool you will need to specify the two servers in the cluster, and finally make sure you choose the new Health Probe we just created.

3_-_add_probe

We will now add a rule for TCP/1440. As show in the picture below, add a new rule for port TCP 1440, or whatever port locked down for the named instance of SQL Server. Again, be sure to choose the new FrontEnd IP Address and the new Health Probe (59998). Also, make sure the Floating IP (direct server return) is enabled.

4_-_Add_Probe.png

Now that the load balancer is configured, the final step is to run the PowerShell script to update the new Cluster IP address associated with this 2nd instance of SQL Server. This PowerShell script only needs to be run on one of the cluster nodes.

# Define variables

$ClusterNetworkName = “”

# the cluster network name (Use Get-ClusterNetwork on Windows Server 2012 of higher to find the name)

$IPResourceName = “”

# the IP Address resource name of the second instance of SQL Server

$ILBIP = “”

# the IP Address of the second instance of SQL, which should be the same as the new Frontend IP address as well

Import-Module FailoverClusters

# If you are using Windows Server 2012 or higher:

Get-ClusterResource $IPResourceName | Set-ClusterParameter -Multiple @{Address=$ILBIP;ProbePort=59998;SubnetMask="255.255.255.255";Network=$ClusterNetworkName;EnableDhcp=0}

# If you are using Windows Server 2008 R2 use this:

#cluster res $IPResourceName /priv enabledhcp=0 address=$ILBIP probeport=59998  subnetmask=255.255.255.255

You now have a fully functional multi-instance SQL Server FCI in Azure. Let me know if you have any questions.

New Azure ILB feature allows you to build a multi-instance SQL Server Failover Cluster in #Azure