Major Cloud Outage impacts Google Compute Engine – Were you prepared?

June 3, 2019June 3, 2019 daveberm2 Comments

Google first reported an “Issue” on Jun 2, 2019 at 12:25 PDT. As is now common in any type of disaster, reports of this outage first appeared on social media. It seems now the most reliable place to get any type of information early in a disaster is social media.

Twitter is quickly becoming the first source of information on anything from revolutions, natural disasters to cloud outages.

Many services that rely on Google Compute Engine were impacted. With three teenage kids at home, I first knew something was up when all three kids emerged from their caves, aka, bedrooms, at the same time with a worried look on their faces. Snapchat, Youtube and Discord were all offline!

They must have thought that surely this was the first sign of the apocalypse. I reassured them that this was not the beginning of the new dark ages and that maybe they should go outside and do some yard work instead. That scared them back to reality and they quickly scurried away to find something else to occupy their time.

All kidding aside, there were many services being reported as down, or only available in certain areas. The dust is still settling on the cause, breadth and scope of the outage, but it certainly seems that the outage was pretty significant in size and scope, impacting many customers and services including Gmail and other G-Suite services, Vimeo and more.

Many services were impacted by this outage, Gmail, YouTube and SnapChat just to name a few.

While we wait for the official root cause analysis on this latest Google Compute Engine outage, Google reported “high levels of network congestion in the eastern USA” caused the downtime. We will have to wait to see what they determine caused the network issues. Was it human error, cyber-attack, hardware failure, or something else?

Were you prepared?

As I wrote during the last major cloud outage, if you are running business critical workloads in the cloud, regardless of the cloud service provider, it is incumbent upon you to plan for the inevitable outage. The multi-day Azure outage of Sept 4th, 2018 was related to a failure of the secondary HVAC system to kick in during a power surge related to an electrical storm. While the failure was just within a single datacenter, the outage exposed multiple services that had dependencies on this single datacenter, making that datacenter itself a single point of failure.

Leveraging the cloud’s infrastructure, you can minimize your risks by continuously replicating critical data between Availability Zones, Regions or even cloud service providers. In addition to data protection, having a procedure in place to rapidly recover business critical applications is an essential part of any disaster recovery plan. There are various replication and recovery options available, from services provided by the cloud vendor themselves like Azure Site Recovery, to application specific solutions like SQL Server Always On Availability Groups, to third party solutions like SIOS DataKeeper that protect a wide range of applications running on both Windows and Linux.

Having a disaster recovery strategy that is wholly dependent on a single cloud provider leaves you susceptible to a scenario that might impact multiple regions within a single cloud. Multi-datacenter or multi-region disasters are not likely. However, as we saw with this recent outage and the Azure outage last fall, even if a failure is local to a single datacenter, the impact can be wide reaching across multiple datacenters or even regions within a cloud. To minimize your risks, may want to consider a multi-cloud or hybrid cloud scenario where your disaster recovery site resides outside of your primary cloud platform.

The cloud is just as susceptible to outages as your own datacenter. You must take steps to prepare for disasters. I suggest you start by looking at your most business critical apps first. What would you do if they were offline and the cloud portal to manage them was not even available? Could you recover? Would you meet your RTO and RPO objectives? If not, maybe it is time to re-evaluate your DR strategy.

“By failing to prepare, you are preparing to fail.”
― Benjamin Franklin

New Azure “SQL Server settings” blade in the Azure Portal

May 24, 2019May 24, 2019 daveberm1 Comment

I just noticed today that there is a new blade in the Azure portal when creating a new SQL Server virtual machine. I’ve been looking for an announcement regarding this new Azure portal experience, but I haven’t found one yet. This feature wasn’t available when I took the screen shots for my last post on creating a SQL Server 2008 R2 FCI in Azure on April 19th, so it must be relatively new.

New Azure “SQL Server Settings” blade on the Azure portal

Most of the settings are pretty self explanatory. Under Security and Networking you can specify the port you want SQL to listen on. It also appears as if the Azure Security Group will be updated to allow different levels of access to the SQL instance: Local, Private or Public. Authentication options are also exposed in this new SQL Server settings blade.

Security, Networking and Authentication options are part of your SQL Server deployment

The rest of the features include licensing, patching and backup options. In addition, if you are deploying the Enterprise Edition of SQL Server 2016 or later you also have the option to enable SQL Server R Services for advanced analytics.

Licensing, Patching, Backup and R Services options can be automatically configured

All of those options are welcome additions to the Azure portal experience when provisioning a new SQL Server instance. I’m sure the seasoned DBA probably has a list of a few dozen other options they would like to tweak before a SQL Server deployment, but this is certainly a step in the right direction.

Storage Configuration Options

The most interesting new feature I have found on this blade is the Storage Configuration option.

When you click on Change Configuration, you get the following blade.

As you slide the IOPS slider to the right you will see the number of data disks increase, the Storage Size increase, and the Throughput increase. You will be limited to the max number of IOPS and disks supported by that instance size. You see in the screenshot below I am able to go as high as 80,000 IOPS when provisioning storage for a Standard E64-16s_v3 instance.

The Standard E64-16s_v3 instance size supports up to 80,000 IOPS

There is also a “Storage optimization” option. I haven’t tried all the different combinations to know exactly what the Storage optimization setting does. If you know how the different options change the storage configuration, leave me a comment, or we will just wait for the official documentation to be released.

For my test, I provisioned a Standard DS13 v2 instance and maxed out the IOPS at 25600, the max IOPS for that instance size. I also optimized the storage for Transactional processing.

I found that when this instance was provisioned, six P30 premium disk were attached to the instance. This makes sense, since each P30 delivers 5000 IOPS, so it would take at least six of them to deliver the 25,600 IOPS requested. This also increased the Storage Size to 6 TB, since each P30 gives you one 1 TB of storage space. The Read-only host caching was also enabled on these disks.

The six disks were automatically provisioned and attached to the instance

When I logged in to the instance to see what Azure had done with those disk I found that they had done exactly what I would have done; they created a single Storage Pool with the six P30 disks and created a Simple (aka, RAID 0) Storage Space and provisioned a single 6 TB F:\ drive.

This storage configuration wizard validates some of the cloud storage assumptions I made in my previous blog post, Storage Considerations for Running SQL Server in Azure. It seems like a single, large disk should suffice in most circumstances.

A Simple Storage Space consisting of the six P30s are presented as a single F:\ drive

I have found this storage optimization is not available in every Azure Marketplace offering. For example, if you are moving SQL Server 2008 R2 to Azure for the extended security updates you will find that this storage optimization in not available in the SQL2008R2/Windows Server 2008 R2 Azure Marketplace image. Of course, Storage Spaces was not introduced until Windows Server 2012, so that makes sense. I did verify that this option is available with the SQL Server 2012 SP4 on Windows Server 2012 R2 Azure Marketplace offering.

There is a minor inconvenience however. In addition to adding this new Storage configuration option on SQL Server settings blade, they also removed the option to add Data Disks on the Disks blade. So if I wanted to provision additional storage without creating a Storage Space, I would have to create the instance first and then come back and add Data disks after it the virtual machine is provisioned.

Final thoughts

All of the SQL Server configuration options in this new Azure blade are welcome additions. I would love to see the list tunable settings grow. Information text should include guidance on current best practices for each tunable.

What SQL Server or Windows OS tunables would you like to see exposed as part of the provisioning process to make your life as a DBA easier? Not only would these tunables make your life easier, but they would also make the junior DBA look like a season pro by guiding them through all the current SQL Server configuration best practices.

I think the new Storage configuration option is probably the most compelling new addition. Prior to the Storage configuration wizard, users had to be aware of the limits of their instance size, the limits of the storage they were adding, and have the wherewithal to stripe together multiple disks in a Simple Storage Space to get the maximum IOPS. A few years ago I put together a simple Azure Storage Calculator to help people make these decisions. My calculator is currently outdated, but this new Storage configuration option may make it obsolete anyway.

I would love to see this Storage configuration wizard included as a standard offering in the Disks blade of every Windows instance type, rather than just the SQL Server instances. I would let the user choose to use the new Storage configuration “Wizard” experience, or the “Classic” experience where you manually add and manage storage.

Microsoft Build 2019 Announcements and Sessions on Demand

May 10, 2019 davebermLeave a comment

If you are like me and were unable to get away from the office to attend Microsoft Build 2019 you will be glad to know that Microsoft has published all the sessions and are available online at no charge.

Being a developer focused conference most of the announcements were geared towards developers. You can see a complete list of searchable announcements here. https://azure.microsoft.com/en-us/updates/?updatetype=microsoft-build&Page=1

I’m more of an infrastructure guy, so some of the more interesting announcements to me were the following.

Azure VMware Solutions is now generally available – I guess if I was heavily invested in VMware and was looking to expand into Azure this announcement would certainly open up some interesting possibilities. It looks like if I use a bare-metal instance or dedicated instance I can basically have an ESX host running in Azure. I guess if you are doing a hybrid-cloud deployment and want to easily move workloads back and forth between on-prem and Azure his makes sense. If this excites you leave me a comment telling me why.

Azure Quickstart Center enables new customers to build cloud projects with confidence – I haven’t looked into this yet, but this looks VERY interesting to me if it is what I think it is. As cloud adoption continues to grow so does the required skill set of the IT professional. Infrastructure as Code (IaC) is one of those skills the wise IT professional will want to become comfortable with.

Just two years ago I would talk to customers and this skill set was non-existent, or really just something the largest IT consultants or cloud providers had any experience in. Over the past year I have seen this skill set become more common with the customers I work with and is often the preferred deployment method. The IaS technology has also matured over that period.

I predict that if you don’t already manage your cloud deployment with IaC you will be in the near future. I’m hopeful that this new offering from Microsoft could be a great intro to those IT professionals looking to gain some IaC experience and knowledge. After I have had a look at it I’ll post a follow-up article.

Storage Considerations for Running SQL Server in Azure

May 2, 2019 daveberm#WindowsAzure #Azure #Cloud1 Comment

If you are deploying SQL Server in Azure, or any Cloud platform for that matter, instead of just provisioning storage like you did for your on-premises deployments for many years, you may consider that storage in the Azure isn’t exactly like the storage you may have had access to on-premises. Some traditional “best practices” may wind up costing you additional money and give you less than optimal performance, all while not providing you any of the intended benefits. Much of what I am about to discuss is also described in Performance Guidelines for Azure in SQL Server Virtual Machines.

Disk Types

I’m not here to tell you that you must use UltraSSD, Premium Storage, or any other disk type. You just need to be aware that you have options, and what each disk type brings to the table. Of course, like anything else in the cloud, the more money you spend, the more power, speed, throughput, etc., you will achieve. The trick is finding the optimal configuration so that you spend just enough to achieve the desired results.

Size DOES Matters

Like many things in the cloud, certain specs are tied together. For servers if you want more RAM you often get more CPU, even if you didn’t NEED more CPU. For storage, IOPS, throughput and size are all tied together. If you want more IOPS, you need a bigger disk. If you need more space, you also get more IOPS. Of course you can jump between storage classes to circumvent that to some extent, but it still holds true that if you need more IOPS, you also get more space on any of the different storage types.

The size of your virtual machine instance also matters. Regardless of what storage configuration you eventually go with, the overall throughput will be capped at whatever the instance size allows. So once again, you may need to pay for more RAM and CPU than you need, just to achieve your desired storage performance. Make sure you understand what your instance size can support in terms of max IOPS and MBps throughput. Many times the instance size will turn out to be the bottleneck in a perceived storage performance problem in Azure.

Use RAID 0

RAID 0 is traditionally the 3rd rail of storage configuration options. Although it provides the best combination of performance and storage utilization of any RAID option, it does so at the risk of a catastrophic failure. If just a single disk in a RAID 0 stripe set should fail, the entire stripe set fails. So traditionally RAID 0 is only used in scenarios where data loss is acceptable and high performance is desirable.

However, in Azure software RAID 0 is desirable and even recommended in many situations. How can we get away with RAID 0 in Azure? The answer is easy. Each disk you present to an Azure virtual machine instance already has triple redundancy on the backend, meaning you would need to have multiple failures before you would lose your stripe set. By using RAID 0, you can combine multiple disks and the overall performance of the combined stripe set will increase by 100% for each additional disk you add to the stripe set.

So for example, if you had a requirement of 10,000 IOPS, you might think that you need UltraSSD since Premium Storage maxes out at 7,500 IOPS with a P50. However, if you put two P50s in a RAID 0, you now have the potential to achieve up to 15,000 IOPS, assuming you are running a Standard_F16s_v2 or similarly large instance size that supports that many IOPS.

In Windows 2012 and later, RAID 0 is achieved by creating a Simple Storage Space. In Windows Server 2008 R2 you can use Dynamic Disks to create a RAID 0 Striped Volume. Just a word of caution, if you are going to use a local Storage Space and also configure Availability Groups or a SANless Failover Cluster Instance with DataKeeper, it is best to configure your storage BEFORE you create a cluster.

Just a reminder, you only have about two more months to move your SQL Server 2008 R2 instances to Azure. Check out my post on how to deploy a SQL Server 2008 R2 FCI on Azure to ensure high availability.

Don’t bother separating log and data files

Traditionally log and data files would reside on different physical disks. Log files tend to have a lot of write activity and data files tend to have more read activity, so sometimes storage would be optimized based on those characteristics. It was also desirable to keep log and data files on different disks for recovery purposes. If you should lose one or the other, with a proper backup strategy in place you could recover your database with no data loss.

With cloud based storage, the likelihood of losing just a single volume is EXTREMELY low. If by chance you lose storage, it is likely your entire storage cluster, along with the triple redundancy, went to lunch. So while it may feel right to put logs in E:\ logs and data in F:\data, you really are doing yourself a disservice. For example, if you provision a P20 for logs and a P20 for data, each volume will be 512 GiB in size and capped at 2,300 IOPS. And just think, you may not need all that size for log files, but it might not give you much room to grow for your data files, which will eventually require moving to a more expensive P30 just for the extra space.

Wouldn’t it be much nicer to simply stripe those two volumes together into a nice large 1 TB volume that supports 4,600 IOPS? By doing that both the log and data files can take advantage of the increased IOPS and you have also just optimized your storage utilization and decreased your cloud storage cost by putting off the move to a P30 disk for your data file.

The same holds true files and filegroups. Really think hard about what you are doing and whether it still makes sense once you move to the cloud. What makes sense might be counter intuitive to what you have done in the past. When in doubt, follow the KISS rule, Keep It Simple Stupid! The beauty of the cloud is you can always add more storage, increase instance size, or do whatever it takes to optimize performance vs. cost.

What to do about TempDB

Use the local SSD, aka, the D: drive. The D drive is going to be the best location for your tempdb. Because it is a local drive the data is considered “temporary”, meaning it can be lost if a server is moved, rebooted, etc. That’s okay, tempdb is recreated each time SQL starts anyway. The local SSD is going to be fast and have low latency, but because it is local the reads and writes to it do not contribute to the overall storage IOPS limit of the instance size, so effectively it is FREE IOPS, so why not take advantage? If you are building a SANless SQL Server FCI with SIOS DataKeeper, be sure to create a non-mirrored volume resource of the D drive so you don’t needlessly replicate TempDB.

Mount Points Become Obsolete

Mount Points are commonly used in SQL Server FCI configurations when multiple instances of SQL Server are installed on the same Windows Cluster. This reduces the overall cost of SQL Server licenses and can help save cost by driving higher server utilization. As we discussed in the past, typically there might be five or more drives associated with each SQL Server instance. If each of those drives had to consume a drive letter you would run out of letters in just about three to four instances. So instead of giving each drive a letter, mount points were used so that each instance could just be serviced by a single drive letter, the root drive. The root drive has mount points that map to separate physical disks that don’t have drive letters.

However, as we discussed above, the concept of using a bunch of individual disks really doesn’t make a lot of sense in the cloud, hence mount points become obsolete in the cloud. Instead, create a RAID 0 stripe we as described and each clustered instance SQL Server will simply have its own individual volume that is optimised for space, performance and cost. This solves the problem of running out of drive letters and gives you much better storage utilization and performance while also reducing the cost of your cloud storage.

Conclusions

This post is meant as a jumping off point, not a definitive guide. The main point of the post is to get you thinking differently about cloud and storage as it pertains to running SQL Server. Don’t simply take what you did on-premise and recreate it in the cloud, that will almost always result in less than optimal performance and a much larger storage bill than necessary.

STEP-BY-STEP: HOW TO CONFIGURE A SQL SERVER 2008 R2 FAILOVER CLUSTER INSTANCE ON WINDOWS SERVER 2008 R2 IN AZURE OR AZURE STACK

April 19, 2019April 17, 2020 daveberm6 Comments

Intro

On July 9, 2019, support for SQL Server 2008 and 2008 R2 will end. That means the end of regular security updates. However, if you move those SQL Server instances to Azure or Azure Stack (I will simply refer to both as Azure for the rest of the guide), Microsoft will give you three years of Extended Security Updates at no additional charge. If you are currently running SQL Server 2008/2008 R2 and you are unable to update to a later version of SQL Server before the July 9th deadline, you will want to take advantage of this offer rather than running the risk of facing a future security vulnerability. An unpatched instance of SQL Server could lead to data loss, downtime or a devastating data breach.

One of the challenges you will face when running SQL Server 2008/2008 R2 in Azure is ensuring high availability. On premises you may be running a SQL Server Failover Cluster (FCI) instance for high availability, or possibly you are running SQL Server in a virtual machine and are relying on VMware HA or a Hyper-V cluster for availability. When moving to Azure, none of those options are available. Downtime in Azure is a very real possibility that you must take steps to mitigate.

In order to mitigate the possibility of downtime and qualify for Azure’s 99.95% or 99.99% SLA, you have to leverage SIOS DataKeeper. DataKeeper overcomes Azure’s lack of shared storage and allows you to build a SQL Server FCI in Azure that leverages the locally attached storage on each instance. SIOS DataKeeper not only supports SQL Server 2008 R2 and Windows Server 2008 R2 as documented in this guide, it supports any version of Windows Server, from 2008 R2 through Windows Server 2019 and any version of SQL Server from from SQL Server 2008 through SQL Server 2019.

This guide will walk through the process of creating a two-node SQL Server 2008 R2 Failover Cluster Instance (FCI) in Azure, running on Windows Server 2008 R2. Although SIOS DataKeeper also supports clusters that span Availability Zones or Regions, this guide assumes each node resides in the same Azure Region, but different Fault Domains. SIOS DataKeeper will be used in place of the shared storage normally required to create a SQL Server 2008 R2 FCI.

Pre-Requisites

Active Directory
This guide assumes you have an existing Active Directory Domain. You can manage your own Domain Controllers or use Azure Active Directory Domain Services. For this tutorial we will connect to a domain called contoso.local. Of course you will connect to your own domain when following this tutorial.

Open Firewall Ports
– SQL Server:1433 for Default Instance
– Load Balancer Health Probe: 59999
– DataKeeper: these firewall rules are added to the Windows host based firewall automatically during installation. For details on which ports are opened consult the SIOS documentation.
– Keep in mind, if you have any network based security in place that blocks ports between the cluster nodes you will need to account for these ports there as well.

DataKeeper Service Account
Create a Domain account. We will specify this account when we install DataKeeper. This account will need to be added to the Local Administrators group on each node of the cluster.

Create the first SQL Server Instance in Azure

This guide will leverage the SQL Server 2008R2SP3 on Windows Server 2008R2 image that is published in the Azure Marketplace.

When you provision the first instance you will have to create a new Availability Set. During this process be sure to increase the number of Fault Domains to 3. This allows the two cluster nodes and the file share witness each to reside in their own Fault Domain.

If you don’t already have a virtual network configured, allow the creation wizard to create a new one for you.

Once the instance is created, go in to the IP configurations and make the Private IP address static. This is required for SIOS DataKeeper and is best practice for clustered instances.

Make sure that your virtual network is configured to set the DNS server to be a local Windows AD controller to ensure you will be able to join the domain in a later step.

After the virtual machines are provisioned, add at least two additional disks to each instance. Premium or Ultra SSD are recommended. Disable caching on the disks used for the SQL log files. Enable read-only caching on the disk used for the SQL data files. Refer to Performance guidelines for SQL Server in Azure Virtual Machines for additional information on storage best practices.

Create the 2nd SQL Server Instance in Azure

Follow the same steps as above, except be sure to place this instance in the same virtual network and Availability Set that you created with the 1st instance.

Create a File Share Witness (FSW) Instance

In order for the Windows Server Failover Cluster (WSFC) to work optimally you are required to create another Windows Server instance and place it in the same Availability Set as the SQL Server instances. By placing it in the same Availability Set you ensure that each cluster node and the FSW reside in different Fault Domains, ensuring your cluster stays on line should an entire Fault Domain go off line. This instances does not require SQL Server, it can be a simple Windows Server as all it needs to do is host a simple file share.

This instance will host the file share witness required by WSFC. This instance does not need to be the same size, nor does it require any additional disks to be attached. It’s only purpose is to host a simple file share. It can in fact be used for other purposes. In my lab environment my FSW is also my domain controller.

Uninstall SQL Server 2008 R2

Each of the two SQL Server instances provisioned already have SQL Server 2008 R2 installed on them. However, they are installed as standalone SQL Server instances, not clustered instances. SQL Server must be uninstalled from each of these instances before we can install the cluster instance. The easiest way to do that is to run the SQL Setup as shown below.

When you run setup.exe /Action-RunDiscovery you will see everything that is preinstalled

setup.exe /Action=RunDiscovery

Running setup.exe /Action=Uninstall /FEATURES=SQL,AS,RS,IS,Tools /INSTANCENAME=MSSQLSERVER kicks off the uninstall process

setup.exe /Action=Uninstall /FEATURES=SQL,AS,RS,IS,Tools /INSTANCENAME=MSSQLSERVER

Running setup.exe /Action-RunDiscovery confirms the uninstallation completed

setup.exe /Action-RunDiscovery

Run this uninstallation process again on the 2nd instance.

Add instances to the Domain

All three of these instances will need to be added to a Windows Domain. As mentioned in the Prerequisites section, you must have access to join an existing Windows Active Directory. In our case, we are joining a domain called contoso.local.

Add Windows Failover Clustering Feature

The Failover Clustering Feature needs to be added to the two SQL Server instances

Add-WindowsFeature Failover-Clustering

Install Convenience Rollup Update for Windows Server 2008 R2 SP1

There is a critical update ( kb2854082) that is required in order to configure a Windows Server 2008 R2 instance in Azure. That update and many more are included in the Convenience Rollup Update for Windows Server 2008 R2 SP1. Install this update on each of the two SQL Server instances.

Format the Storage

The additional disks that were attached when the two SQL Server instances were provisioned need to be formatted. Do the following for each volume on each instance.

Microsoft best practices says the following…

“NTFS allocation unit size: When formatting the data disk, it is recommended that you use a 64-KB allocation unit size for data and log files as well as TempDB.”

Run Cluster Validation

Run cluster validation to ensure everything is ready to be clustered.

Import-Module FailoverClusters
Test-Cluster -Node "SQL1", "SQL2"

Your report will contain WARNINGS about Storage and Networking. You can ignore those warnings as we know there are no shared disks and only a single network connection exists between the servers. You may also receive a warning about network binding order which can also be ignored. If you encounter any ERRORS you must address those before you continue.

Since there are no “Potential Cluster DIsks” available, the first test throws a warning and all the subsequent disks test are skipped. This is expected since we will be using just local disks replicated with SIOS DataKeeper.

The Validate Network Communication tests warn about just a single network being available between cluster nodes. You can ignore this warning since the network redundancy is handled at the virtual layer by Azure.

Error trying to run Cluster Validation?

I have encountered this error on a few occasions and I’m still trying to sort out under what conditions this occurs. Occasionally you will find that test-cluster fails to run as described in the forum post.

Test-Cluster
Unable to Validate a Cluster Configuration. The operation has failed. The action validate a configuration did not complete
There is an error in XML document (5, 73).  

Attempt by method

Microsoft.Xml.Serialzation.GeneratedAssembly.XmlSerialzationReaderClusterPrep.Config.Read4_As...Bolean) to access method

MS.Internal.ServerClusters.Validation.TestAssemblyCollection.Add(MS.Internal.ServerClusters.V....Failed

If this happens to you, I have found the following fix recommended in the forum post works for me.

Inside C:\Windows\System32\WindowsPowerShell\v1.0 make a copy of powershell_ise.exe.config file (make a copy inside C:\Windows\System32\WindowsPowerShell\v1.0)- rename it to powershell.exe.config

Open it with notepad- delete current config line and paste:
<?xml version="1.0" encoding="utf-8" ?>
<configuration>
  <system.xml.serialization>
    <xmlSerializer useLegacySerializerGeneration="true"/>
  </system.xml.serialization>
</configuration>
- save and run test-cluster

While this fix will allow you to run test-cluster from Powershell, I have found that running Validate through the GUI still throws an error, even with this fix. I have a query in to Microsoft to see if they have a solution, but for now if you need to run cluster Validation you may have to use Test-Cluster in Powershell.

Create the Cluster

Best practices for creating a cluster in Azure would be to use Powershell to create a cluster, specifying a static IP address. Powershell allows us to specify a Static IP Address, whereas the GUI method does not. Unfortunately, Azure’s implementation of DHCP does not work well with WSFC, so if you use the GUI method you will wind up with a duplicate IP address as the Cluster IP Address that will need to be fixed before the cluster is usable.

However, what I have found is that the typical New-Cluster powershell command with the -StaticAddress command doesn’t work. To avoid the problem of the duplicate IP address, we have to resort to the cluster.exe utility and run the following command.

cluster /cluster:cluster1 /create /nodes:"sql1 sql2" /ipaddress:10.0.0.100/255.255.255.0

Add the File Share Witness

Next we need to add the File Share Witness. On the 3rd server we provisioned as the FSW, create a folder and share it as shown below. You will need to grant the Cluster Name Object (CNO) read/write permissions at both the Share and Security levels as shown below.

Once the share is created, run the Configure Cluster Quorum wizard on one of the cluster nodes and follow the steps illustrated below.

Install DataKeeper

Install DataKeeper on each of the two SQL Server cluster nodes as shown below.

This is where we will specify the Domain account we added to each of the local Domain Administrators group.

Configure DataKeeper

Once DataKeeper is installed on each of the two cluster nodes you are ready to configure DataKeeper.

NOTE – The most common error encountered in the following steps is security related, most often by pre-existing Azure Security groups blocking required ports. Please refer to the SIOS documentation to ensure the servers can communicate over the required ports.

First you must connect to each of the two nodes.

If everything is configured properly, you should then see the following in the Server Overview report.

Next, create a New Job and follow the steps illustrated below

Choose Yes here to register the DataKeeper Volume resource in Available Storage

Complete the above steps for each of the volumes. Once you are finished, you should see the following in the WSFC UI.

You are now ready to install SQL Server into the cluster.

NOTE – At this point the replicated volume is only accessible on the node that is currently hosting Available Storage. That is expected, so don’t worry!

Install SQL Server on the first node

If you want to script the installation, I have included the example below of a scripted cluster installation of SQL Server 2008 R2 into the first node of cluster. The script to add a node to existing cluster is found further down in the guide.

Of course adjust for your environment.

c:\SQLServerFull\setup.exe /q /ACTION=InstallFailoverCluster /FEATURES=SQL /INSTANCENAME="MSSQLSERVER" /INSTANCEDIR="C:\Program Files\Microsoft SQL Server" /INSTALLSHAREDDIR="C:\Program Files\Microsoft SQL Server" /SQLSVCACCOUNT="contoso\admin" /SQLSVCPASSWORD="xxxxxxxxx" /AGTSVCACCOUNT="contoso\admin" /AGTSVCPASSWORD="xxxxxxxxx" /SQLDOMAINGROUP="contoso\SQLAdmins" /AGTDOMAINGROUP="contoso\SQLAdmins" /SQLCOLLATION="SQL_Latin1_General_CP1_CI_AS" /FAILOVERCLUSTERGROUP="SQL Server 2008 R2 Group" /FAILOVERCLUSTERDISKS="DataKeeper Volume E" "DataKeeper Volume F" /FAILOVERCLUSTERIPADDRESSES="IPv4;10.0.0.101;Cluster Network 1;255.255.255.0" /FAILOVERCLUSTERNETWORKNAME="SQL2008Cluster" /SQLSYSADMINACCOUNTS="contoso\admin" /SQLUSERDBLOGDIR="E:\MSSQL10.MSSQLSERVER\MSSQL\Log" /SQLTEMPDBLOGDIR="F:\MSSQL10.MSSQLSERVER\MSSQL\Log" /INSTALLSQLDATADIR="F:\MSSQL10.MSSQLSERVER\MSSQLSERVER" /IAcceptSQLServerLicenseTerms

If you prefer to use the GUI, just follow along with the screenshots below.

On the first node, run the SQL Server setup.

Choose New SQL Server Failover Cluster Installation and follow the steps as illustrated.

Choose only the options you need.

Please note, this document assumes you are using the Default instance of SQL Server. If you use a Named Instance you need to make sure you lock down the port that it listens on, and use that port later on when you configure the load balancer. You also will need to create a load balancer rule for the SQL Server Browser Service (UDP 1434) in order to connect to a Named Instance. Neither of those two requirements are covered in this guide, but if you require a Named Instance it will work if you do those two additional steps.

Here you will need to specify an unused IP address

Go to the Data Directories tab and relocate data and log files. At the end of this guide we talk about relocating tempdb to a non-mirrored DataKeeper Volume for optimal performance. For now, just keep it on one of the clustered disks.

Install SQL Server on the second node

Below is an example of the command you can run to add an additional SQL Server 2008 R2 node into an existing cluster.

c:\SQLServerFull\setup.exe /q /ACTION=AddNode /INSTANCENAME="MSSQLSERVER" /SQLSVCACCOUNT="contoso\admin" /SQLSVCPASSWORD="xxxxxxxxx" /AGTSVCACCOUNT="contoso\admin" /AGTSVCPASSWORD="xxxxxxxx" /IAcceptSQLServerLicenseTerms

If you prefer using the GUI, follow along with the following screenshots.

Run the SQL Server setup again on the second node and choose Add node to a SQL Server Failover Cluster.

Congratulations, you are almost done! However, due to Azure’s lack of support for gratuitous ARP, we will need to configure an Internal Load Balancer (ILB) to assist with client redirection as shown in the following steps.

Update the SQL Cluster IP Address

In order for the ILB to function properly, you must run run the following command from one of the cluster nodes. It SQL Cluster IP enables the SQL Cluster IP address to respond to the ILB health probe while also setting the subnet mask to 255.255.255.255 in order to avoid IP address conflicts with the health probe.

cluster res <IPResourceName> /priv enabledhcp=0 address=<ILBIP> probeport=59999  subnetmask=255.255.255.255

NOTE – I don’t know if it is a fluke, but on occasion I have run this command and it looks like it runs, but it doesn’t complete the job and I have to run it again. The way I can tell if it worked is by looking at the Subnet Mask of the SQL Server IP Resource, if it is not 255.255.255.255 then you know it didn’t run successfully. It may simple be a GUI refresh issue, so you can also try restarting the cluster GUI to verify the subnet mask was updated.

After it runs successfully, take the resource offline and bring it back online for the changes to take effect.

Create the Load Balancer

The final step is to create the load balancer. In this case we are assuming you are running the Default Instance of SQL Server, listening on port 1433.

The Private IP Address you define when you Create the load balancer will be the exact same address your SQL Server FCI uses.

Add just the two SQL Server instances to the backend pool. Do NOT add the FSW to the backend pool.

In this load balancing rule you must enable Floating IP

Validate the Cluster

Before you continue, run cluster validation one more time. The Cluster Validation report should return just the same network and storage warnings that it did the first time you ran it. Assuming there are no new errors or warnings, your cluster is configured correctly.

Edit sqlserv.exe Config File

In directory C:\Program Files (x86)\Microsoft SQL Server\100\Tools\Binn we created a sqlps.exe.config file and sqlservr.exe.config with the following lines in the config file:

<configuration>
  <startup>
    <supportedRuntime version="v2.0.50727"/>
  </startup>
</configuration>

These files, by default, will not exist and may be created. If this file(s) already exists for your installation, the <supportedRuntime version=”v2.0.50727″/> line simply needs to be placed with the <startup>…</startup> sub-section of the <configuration>…</configuration> section. This should be done on both servers.

Test the Cluster

The most simple test is to open SQL Server Management Studio on the passive node and connect to the cluster. If you are able to connect, congratulations, you did everything correct! If you can’t connect don’t fear, you wouldn’t be the first person to make a mistake. I wrote a blog article to help troubleshoot the issue. Managing the cluster is exactly the same as managing a traditional shared storage cluster. Everything is controlled through Failover Cluster Manager.

Optional – Relocate Tempdb

For optimal performance it would be advisable to move tempdb to the local, non replicated, SSD. However, SQL Server 2008 R2 requires tempdb to be on a clustered disk. SIOS has a solution called a Non-Mirrored Volume Resource which addresses this issue. It would be advisable to create a non-mirrored volume resource of the local SSD drive and move tempdb there. However, the local SSD drive is non-persistent, so you must take care to ensure the folder holding tempdb and the permissions on that folder are recreated each time the server reboots.

After you create the Non-Mirrored Volume Resource of the local SSD, follow the steps in this article to relocate tempdb. The startup script described in that article must be added to each cluster node.

For More Information

As always, if you have questions or comments you can leave them in the comment section below or reach me on Twitter @daveberm

Achieving SQL Server HA/DR with a mix of Always On Availability Groups and SANless SQL Server Failover Cluster Instances

March 13, 2019March 15, 2019 daveberm6 Comments

Introduction

The topic of mixing SQL Server Failover Cluster Instances (FCI) with Always On Availability Groups (AG) is pretty well documented. However, most of the available documentation documents configurations that assume the SQL Server FCI portion of the solution utilizes shared storage. What if I want to build a SANless SQL Server FCI using Storage Spaces Direct (S2D), can I still add a SQL Server AG to the mix? Unfortunately, the answer to this question is no. As of today, this combination of S2D based SQL Server FCI and Always On AG is not supported. I previously blogged about this S2D limitation here.

However, the good news is you CAN build a SANless SQL Server FCI with SIOS DataKeeper and still leverage Always On AG for things like readable secondaries. You still have to abide by the same rules that apply when mixing traditional SAN based SQL Server FCI and Always On AGs, but other than that it is exactly the same….mostly.

DataKeeper Synchronous replication is commonly used between nodes in the same data center or cloud region, but you may want to replicate asynchronously to an additional node in a different region for disaster recovery. In this case, if you ever do have to bring the DR node online after an unexpected failure, you will have to scrap the Always On AG configurations and reconfigure them. This requirement is very similar to to what Microsoft published here in regards to restoring asynchronous snapshots of SQL Server Always On AGs running inside VMs.

Availability Groups

Essentially, a SANLess SQL Server FCI w/DataKeeper looks like a single instance of SQL Server as far as the Always On Availability Group Wizard is concerned. The configuration of the Always On AG is exactly the same as if you were creating just an Always On AG between two Standalone (non-clustered) SQL Server instances.

The real confusion arise in the fact that in this configuration all the servers reside in the same failover cluster, but the SQL Server FCI is only configured to run only on the cluster nodes where SQL Server was installed as a Clustered SQL Server Instance. The other nodes are in the same cluster, but SQL is installed on those nodes as a Standalone SQL Server Instance, not a Clustered Instance. It’s a bit confusing, but what is happening is that Always On AG’s leverage the WSFC quorum model and listeners, so all the AG Replicas need to reside in the same WSFC, even though they typically do not run clustered instances of SQL Server. If you are completely confused that is okay, most people are confused when they first try to wrap their head around this hybrid configuration.

The real benefit in a configuration like this is that a SQL Server FCI can be a better and more cost effective (more on this later*) HA solution than Always On AG in many circumstances, but it lacks the ability to offer a readable secondary replica. Adding an Always On AG readable secondary replica becomes a viable option to address this need. And using SIOS DataKeeper eliminates the need for a SAN for the SQL Server FCI, which opens up the possibility of configuring SQL Server FCIs where nodes reside in different data center, which also means support for SQL Server FCI’s that span Availability Zones in both Azure and AWS.

Please note that pictured below is just one possible configuration. Multiple FCI cluster nodes, multiple AGs and multiple Replicas are all supported. You are only limited by the limits imposed by your version of SQL Server.

This article seems to document the setup steps pretty well. Of course, instead of shared storage for the SQL FCI, you will use SIOS DataKeeper to build the FCI as I document here.

Image result for SQL Server FCI with Availability Groups

Basic Availability Groups

As of SQL Server 2016 a scaled down “Basic Availability Groups” became available in SQL Server Standard Edition, making this configuration possible even in SQL Server Standard Edition. Basic AGs are limited to a single database per Availability Group, a Single Replica (2-nodes). However, they do not support a readable secondary replica so their use cases in this hybrid configuration are very limited.

Distributed Availability Groups

Distributed AGs were introduced in SQL Server 2016 are also supported in this hybrid configuration. Distributed AGs are very similar to regular AGs, but the Replicas do not need to reside in the same cluster, or even in the same Windows Domain. Microsoft documents the the main use cases of Distributed Availability Groups as follows:

Disaster recovery and easier multi-site configurations
Migration to new hardware or configurations, which might include using new hardware or changing the underlying operating systems
Increasing the number of readable replicas beyond eight in a single availability group by spanning multiple availability groups

Image result for distributed availability groups

Summary

If you like the idea of SQL Server FCIs for high availability, but want the flexibility of read-only secondary replicas, this hybrid solution might just be the thing you are looking for. Traditional SAN baseds SQL Server FCIs, and even Storage Spaces Direct (S2D) based FCIs, limit you to a single data center. SIOS DataKeeper frees you from the limits of your SAN and enables configurations such as SQL Server FCI that span Availability Zones or Cloud Regions. It also eliminates the reliance on the SAN, allowing you to leverage locally attached high speed storage devices without giving up your SQL Server FCI.

* How to Save Money

Earlier I promised I would tell you how to save money by doing this all with SQL Server Standard Edition. If you can live with readable replicas that are point in time based snapshots, you can skip Always On AGs completely and just use the SIOS DataKeeper target side snapshot feature to periodically take an application consistent snapshot of the volumes on the target server without impacting ongoing replication or availability. Here’s how…

http://discover.us.sios.com/rs/siostechnology/images/10-Ways-Save-AlwaysOn-vs-Failover-Clustering.pdf

Create a 2-node SQL Server FCI with SQL Server Standard Edition and save a boatload of money on SQL licenses, but yet still replicate the data to a 3rd node outside the cluster for reporting or DR purposes. If you take a snapshot of the volumes on this third server these snapshots are read-right accessible, so you can mount those databases from a standalone instance of SQL Server to run month end reports, copy to archives, or you might even want to use those snapshot to quickly and easily update your QA and Test/Dev environments with the latest SQL data.

I hope you found this helpful and informative. As always, if you have questions, add them here or reach me on Twitter @daveberm

Configure SQL Server Failover Cluster Instance on Azure Virtual Machines with MSDTC #SQL #Azure #MSDTC

February 15, 2019December 16, 2020 davebermLeave a comment

If you have been following my blog, you probably know that I write a lot of step-by-step guides for building SQL Server Failover Cluster Instances (FCI) on Azure, from SQL Server 2008 through the lastest. Here are some links to get you started, but really there is very little difference in the configuration between the different versions of Windows and SQL Server, so I think you will be able to figure it out regardless of what versions you use.

STEP-BY-STEP: HOW TO CONFIGURE A SQL SERVER FAILOVER CLUSTER INSTANCE (FCI) IN MICROSOFT AZURE IAAS #SQLSERVER #AZURE #SANLESS

STEP-BY-STEP: HOW TO CONFIGURE A SQL SERVER 2008 R2 FAILOVER CLUSTER INSTANCE IN AZURE

What I have not addressed is what to do about MSDTC. Microsoft addressed that in this article posted here.

https://blogs.msdn.microsoft.com/sql_pfe_blog/2018/07/05/configure-sql-server-failover-cluster-instance-on-azure-virtual-machines-with-msdtc

However, that article/video only addresses SQL Server 2016 and later. The good news is that most of that guidance can be applied to SQL Server 2008/2012/2014. Until I have time to do a proper step-by-step guide I wanted to jot down some basic notes, more as a reminder to myself, but you might find this information useful as well in the meantime.

The steps below assume you have already created a SQL Server FCI in Azure and clustered the DTC resource. Reference the guides above for the details on those steps. The steps below really just detail the load balancer configuration required in Azure to make this work.

Create Load Balancer for MSDTC

The MSDTC resource will require its own load balancer. Instead of creating a new load balancer, we will add a new frontend to the load balancer that should already be configured for the SQL Server FCI. Of course this frontend IP address should match the cluster IP address associated with the clustered MSDTC resource.

For the backend pool just reuse the existing pool that you created that contains the SQL cluster nodes.

You will need to create a new health probe dedicated to the MSDTC resource. The port you use has to be different than the one you used for the SQL resource, so don’t use 59999. Instead maybe use something like 49999.

The final step is to create the load balancing rule for MSDTC. Create a new rule and reference the MSDTC frontend that we just created and the existing backend. Next we need to create a new load balancing rule. Since MSDTC uses ephemeral ports, which is a big range of ports, when you create the rule you have to select the box that says “HA Ports”. And finally make sure Direct Server Return is enabled.

Update MSDTC Cluster IP Resource

Just like our SQL Server Cluster IP address, we need to run a Powershell command that will for the MSDTC cluster IP resource to respond to the health probe we just created that probes port 49999. It also sets the subnet mask of that MSDTC cluster IP address to 255.255.255.255 to avoid IP address conflicts with the load balancer frontend we setup that shares the same address.

# Define variables $ClusterNetworkName = “”  
# the cluster network name (Use Get-ClusterNetwork on Windows Server 2012 of higher to find the name of the MSDTC resource) $IPResourceName = “”  
# the IP Address resource name of the MSDTC resource  $ILBIP = “”  
# the IP Address of the Internal Load Balancer (ILB) and MSDTC resource 
Import-Module FailoverClusters 
# If you are using Windows Server 2012 or higher: 
Get-ClusterResource $IPResourceName | Set-ClusterParameter -Multiple @{Address=$ILBIP;ProbePort=49999;SubnetMask="255.255.255.255";Network=$ClusterNetworkName;EnableDhcp=0} 
# If you are using Windows Server 2008 R2 use this:  
#cluster res $IPResourceName /priv enabledhcp=0 address=$ILBIP probeport=59999  subnetmask=255.255.255.255

Confirm it is working!

You can use DTCPing or go into Component Services and look under Computers>My Computers>Distributed Transaction Coordinator where you should see a local DTC and a clustered DTC. Any distributed transactions should appear in the clustered DTC, not the local DTC. Check out this video for an example of how to create a distributed transaction for testing.

Next Steps

This is a quick and dirty guide, but for the experienced user it should get your MSDTC resource up and running in Azure. I’ll be publishing a detailed step-by-step guide in the near future. In the meantime, if you get stuck don’t hesitate to reach out to me on Twitter @daveberm

For more information visit https://us.sios.com/solutions/cloud-high-availability/azure/

Step-by-Step: How to configure a SQL Server 2008 R2 Failover Cluster Instance in Azure

February 4, 2019February 4, 2020 davebermLeave a comment

Introduction

If you are reading this article you probably are still using SQL Server 2008/2008 R2 and want to take advantage of the extended security updates that Microsoft is offering if you move your SQL Server 2008/2008 R2 into Azure. I previously wrote about this topic in this blog post.

You may be wondering how to make sure your SQL Server instance remains highly available once you make the move to Azure. Today, most people have business critical SQL Server 2008/2008 R2 configured as a clustered instance (SQL Server FCI) in their data center. When looking at Azure you have probably come to the realization that due to the lack of shared storage it might seem that you can’t bring your SQL Server FCI to the Azure cloud. However, that is not the case thanks to SIOS DataKeeper.

SIOS DataKeeper enables you to build a SQL Server FCI in Azure, AWS, Google Cloud, or anywhere else where shared storage is not available or where you wish to configure multi-site clusters where shared storage doesn’t make sense. DataKeeper has been enabling SANless clusters for WIndows and Linux since 1999. Microsoft documents the use of SIOS DataKeeper for SQL Server FCI in their documentation: High availability and disaster recovery for SQL Server in Azure Virtual Machines.

I’ve written about SQL Server FCI’s running in Azure before, but I never published a Step-by-Step Guide specific to SQL Server 2008/2008 R2. The good news is that it works just as great with SQL 2008/2008 R2 as it does with SQL 2012/2014/2016/2017 and the soon to be released 2019. Also, regardless of the version of Windows Server (2008/2012/2016/2019) or SQL Server (2008/2012/2014/2016/2017) the configuration process is similar enough that this guide should be sufficient enough to get you through any configurations.

If your flavor of SQL or Windows is not covered in any of my guides, don’t be afraid to jump in and build a SQL Server FCI and reference this guide, I think you will figure out any differences and if you ever get stuck just reach out to me on Twitter @daveberm and I’ll be glad to give you a hand.

This guide uses SQL Server 2008 R2 with Windows Server 2012 R2. As of the time of this writing I did not see an Azure Marketplace image of SQL 2008 R2 on Windows Server 2012 R2, so I had to download and install SQL 2008 R2 manually. Personally I prefer this combination, but if you need to use Windows Server 2008 R2 or Windows 212 that is fine. If you use Windows Server 2008 R2 don’t forget to install the kb3125574 Convenience Rollup Update for Windows Server 2008 R2 SP1. Or if you are stuck with Server 2012 (not R2) you need the Hotfix in kb2854082.

Don’t be fooled by this article that says you must install kb2854082 on your SQL Server 2008 R2 instances. If you start searching for that update for Windows Server 2008 R2 you will find that only the version for Server 2012 is available. That particular hotfix for Server 2008 R2 is instead included in the rollup Convenience Rollup Update for Windows Server 2008 R2 SP1.

Provision Azure Instances

I’m not going to go into great detail here with a bunch of screenshots, especially since the Azure Portal UI tends to change pretty frequently, so any screenshots I take will get stale pretty quickly. Instead, I will just cover the important topics that you should be aware of.

Fault Domains or Availability Zones?

In order to ensure your SQL Server instances are highly available, you have to make sure your cluster nodes reside in different Fault Domains (FD) or in different Availability Zones (AZ). Not only do your instances need to reside in different FDs or AZs, but your File Share Witness (see below) also needs to reside in a FD or AZ that is different than that one your cluster nodes reside in.

Here is my take on it. AZs are the newest Azure feature, but they are only supported in a handful of regions so far. AZs give you a higher SLA (99.99%) then FDs (99.95%), and protect you against the kind of cloud outages I describe in my post Azure Outage Post-Mortem. If you can deploy in a region that supports AZs then I recommend you use AZs.

In this guide I used AZs which you will see when you get to the section on configuring the load balancer. However, if you use FDs everything will be exactly the same, except the load balancer configuration will reference Availability Sets rather than Availability Zones.

What is a File Share Witness you ask?

Without going into great detail, Windows Server Failover Clustering (WSFC) requires you configure a “Witness” to ensure failover behaves properly. WSFC supports three kinds of witnesses: Disk, File Share, Cloud. Since we are in Azure a Disk Witness is not possible. Cloud Witness is only available with Windows Server 2016 and later, so that leaves us with a File Share Witness. If you want to learn more about cluster quorums check out my post on the Microsoft Press Blog, From the MVPs: Understanding the Windows Server Failover Cluster Quorum in Windows Server 2012 R2

Add storage to your SQL Server instances

As you provision your SQL Server instances you will want to add additional disks to each instance. Minimally you will need one disk for the SQL Data and Log file, one disk for Tempdb. Whether or not you should have a seperate disk for log and data files is somewhat debated when running in the cloud. On the back end the storage all comes from the same place and your instance size limits your total IOPS. In my opinion there really isn’t any value in separating your log and data files since you cannot ensure that they are running on two physical sets of disks. I’ll leave that for you to decide, but I put log and data all on the same volume.

Normally a SQL Server 2008 R2 FCI would require you to put tempdb on a clustered disk. However, SIOS DataKeeper has this really nifty feature called a DataKeeper Non-Mirrored Volume Resource. This guide does not cover moving tempdb to this non-mirrored volume resource, but for optimal performance you should do this. There really is no good reason to replicate tempdb since it is recreated upon failover anyway.

As far as the storage is concerned you can use any storage type, but certainly use Managed Disks whenever possible. Make sure each node in the cluster has the identical storage configuration. Once you launch the instances you will want to attach these disks and format them NTFS. Make sure each instance uses the same drive letters.

Networking

It’s not a hard requirement, but if at all possible use an instance size that supports accelerated networking. Also, make sure you edit the network interface in the Azure portal so that your instances use a static IP address. For clustering to work properly you want to make sure you update the settings for the DNS server so that it points to your Windows AD/DNS server and not just some public DNS server.

Security

By default, the communications between nodes in the same virtual network are wide open, but if you have locked down your Azure Security Group you will need to know what ports must be open between the cluster nodes and adjust your security group. In my experience, almost all the issues you will encounter when building a cluster in Azure are either caused by blocked ports.

DataKeeper has some some ports that are required to be open between the clustered instance. Those ports are as follows:
UDP: 137, 138
TCP: 139, 445, 9999, plus ports in the 10000 to 10025 range

Failover cluster has its own set of port requirements that I won’t even attempt to document here. This article seems to have that covered. http://dsfnet.blogspot.com/2013/04/windows-server-clustering-sql-server.html

In addition, the Load Balancer described later will use a probe port that must allow inbound traffic on each node. The port that is commonly used and described in this guide is 59999.

And finally if you want your clients to be able to reach your SQL Server instance you want to make sure your SQL Server port is open, which by default is 1433.

Remember, these ports can be blocked by the Windows Firewall or Azure Security Groups, so to be sure to check both to ensure they are accessible.

Join the Domain

A requirement for SQL Server 2008 R2 FCI is that the instances must reside in the same Windows Server Domain. So if you have not done so, make sure you have joined the instances to your Windows domain

Local Service Account

When you install DataKeeper it will ask you to provide a service account. You must create a domain user account and then add that user account to the Local Administrators Group on each node. When asked during the DataKeeper installation, specify that account as the DataKeeper service account. Note – Don’t install DataKeeper just yet!

Domain Global Security Groups

When you install SQL 2008 R2 you will be asked to specify two Global Domain Security Groups. You might want to look ahead at the SQL install instructions and create those groups now. You will also want to create a domain user account and place them in each of these security accounts. You will specify this account as part of the SQL Server Cluster installation.

Other Pre-Requisites

You must enable both Failover Clustering and .Net 3.5 on each instance of the two cluster instances. When you enable Failover Clustering, also be sure to enable the optional “Failover Cluster Automation Server” as it is required for a SQL Server 2008 R2 cluster in Windows Server 2012 R2.

Create the Cluster and DataKeeper Volume Resources

We are now ready to start building the cluster. The first step is to create the base cluster. Because of the way Azure handles DHCP, we MUST create the cluster using Powershell and not the Cluster UI. We use Powershell because it will let us specify a static IP address as part of the creation process. If we used the UI it would see that the VMs use DHCP and it will automatically assign a duplicate IP address, so we we want to avoid that situation by using Powershell as shown below.

New-Cluster -Name cluster1 -Node sql1,sql2 -StaticAddress 10.0.0.100 -NoStorage

After the cluster creates, run Test-Cluster. This is required before SQL Server will install.

Test-Cluster

You will get warnings about Storage and Networking, but you can ignore those as they are expected in a SANless cluster in Azure. If there are any other warnings or errors you must address those before moving on.

After the cluster is created you will need to add the File Share Witness. On the third server we specified as the file share witness, create a file share and give Read/Write permissions to the cluster computer object we just created above. In this case $Cluster1 will be the name of the computer object that needs Read/Write permissions at both the share and NTFS security level.

Once the share is created, you can use the Configure Cluster Quorum Wizard as shown below to configure the File Share Witness.

Install DataKeeper

It is important to wait until the basic cluster is created before we install DataKeeper since the DataKeeper installation registers the DataKeeper Volume Resource type in failover clustering. If you jumped the gun and installed DataKeeper already that is okay. Simply run the setup again and choose Repair Installation.

The screenshots below walk you through a basic installation. Start by running the DataKeeper Setup.

The account you specify below must be a domain account and must be part of the Local Administrators group on each of the cluster nodes.

When presented with the SIOS License Key manager you can browse out to your temporary key, or if you have a permanent key you can copy the System Host ID and use that to request your permanent license. If you ever need to refresh a key the SIOS License Key Manager is a program that will be installed that you can run separately to add a new key.

Create DataKeeper Volume Resource

Once DataKeeper is installed on each node you are ready to create your first DataKeeper Volume Resource. The first step is to open the DataKeeper UI and connect to each of the cluster nodes.

If everything is done correctly the Server Overview Report should look something like this.

You can now create your first Job as shown below.

After you choose a Source and Target you are presented with the following options. For a local target in the same region the only thing you need to select is Synchronous.

Choose Yes and auto-register this volume as a cluster resource.

Once you complete this process open up the Failover Cluster Manager and look in Disk. You should see the DataKeeper Volume resource in Available Storage. At this point WSFC treats this as if it were a normal cluster disk resource.

Slipstream SP3 onto SQL 2008 R2 install media

SQL Server 2008 R2 is only supported on Windows Server 2012 R2 with SQL Server SP2 or later. Unfortunately, Microsoft never released a SQL Server 2008 R2 installation media that that includes SP2 or SP3. Instead, you must slipstream the service pack onto the installation media BEFORE you do the installation. If you try to do the installation with the standard SQL Server 2008 R2 media you will run into all kinds of problems. I don’t remember the exact errors you will see, but I do recall they didn’t really point to the exact problem and you will waste a lot of time trying to figure out what went wrong.

As of the date of this writing, Microsoft does not have a Windows Server 2012 R2 with SQL Server 2008 R2 offering in the Azure Marketplace, so you will be bringing your own SQL license if you want to run SQL 2008 R2 on Windows Server 2012 R2 in Azure. If they add that image later, or if you choose to use the SQL 2008 R2 on Windows Server 2008 R2 image you must first uninstall the existing standalone instance of SQL Server before moving forward.

I followed the guidance in Option 1 of this article to slipstream SP3 on onto my SQL 2008 R2 installation media. You will of course have to adjust a few things as this article references SP2 instead of SP3. Make sure you slipstream SP3 on the installation media we will use for both nodes of the cluster. Once that is done, continue to the next step.

Install SQL Server on the First Node

Using the SQL Server 2008 R2 media with SP3 slipstreamed, run setup and install the first node of the cluster as shown below.

If you use anything other than the Default instance of SQL Server you will have some additional steps not covered in this guide. The biggest difference is you must lock down the port that SQL Server uses since by default a named instance of SQL Server does NOT use 1433. Once you lock down the port you also need to specify that port instead of 1433 whenever we reference port 1433 in this guide, including the firewall setting and the Load Balancer settings.

Here make sure to specify a new IP address that is not in use. This is the same IP address we will use later when we configure the Internal Load Balancer later.

As I mentioned earlier, SQL Server 2008 R2 utilizes AD Security Groups. If you have not already created them, go ahead and create them now as show below before you continue to the next step in the SQL install

Specify the Security Groups you created earlier.

Make sure the service accounts you specify are a member of the associated Security Group.

Specify your SQL Server administrators here.

If everything goes well you are now ready to install SQL Server on the second node of the cluster.

Install SQL Server on the Second Node

One the second node, run the SQL Server 2008 R2 with SP3 install and select Add Node to a SQL Server FCI.

Proceed with the installation as shown in the following screenshots.

Assuming everything went well, you should now have a two node SQL Server 2008 R2 cluster configured that looks something like the following.

However, you probably will notice that you can only connect to the SQL Server instance from the active cluster node. The problem is that Azure does not support gratuitous ARP, so your clients cannot connect directly to the Cluster IP Address. Instead, the clients must connect to an Azure Load Balancer, which will redirect the connection to the active node. To make this work there are two steps: Create the Load Balancer and Fix the SQL Server Cluster IP to respond to the Load Balancer Probe and use a 255.255.255.255 Subnet mask. Those steps are described below.

Edit sqlserv.exe Config File

include the below lines in the sqlservr.exe.config file. This forces SQL Server to use the right CLR integration.

<configuration>
  <startup>
    <supportedRuntime version="v2.0.50727"/>
  </startup>
</configuration>

The file, by default, will not exist and may be created. If this file already exists for your installation, the <supportedRuntime version=”v2.0.50727″/> line simply needs to be placed with the <startup>…</startup> sub-section of the <configuration>…</configuration> section.

Create the Azure Load Balancer

I’m going to assume your clients can communicate directly to the internal IP address of the SQL cluster so we will create an Internal Load Balancer (ILB) in this guide. If you need to expose your SQL Instance on the public internet you can use a Public Load Balancer instead.

In the Azure portal create a new Load Balancer following the screenshots as shown below. The Azure portal UI changes rapidly, but these screenshots should give you enough information to do what you need to do. I will call out important settings as we go along.

Here we create the ILB. The important thing to note on this screen is you must select “Static IP address assignment” and specify the same IP address that we used during the SQL Cluster installation.

Since I used Availability Zones I see Zone Redundant as an option. If you used Availability Sets your experience will be slightly different.

In the Backend pool be sure to select the two SQL Server instances. You DO NOT want to add your File Share Witness in the pool.

Here we configure the Health Probe. Most Azure documentation has us using port 59999, so we will stick with that port for our configuration.

Here we will add a load balancing rule. In our case we want to redirect all SQL Server traffic to TCP port 1433 of the active node. It is also important that you select Floating IP (Direct Server Return) as Enabled.

Run Powershell Script to Update SQL Client Access Point

Now we must run a Powershell script on one of the cluster nodes to allow the Load Balancer Probe to detect which node is active. The script also sets the Subnet Mask of the SQL Cluster IP Address to 255.255.255.255.255 so that it avoids IP address conflicts with the Load Balancer we just created.

# Define variables
$ClusterNetworkName = “” 
# the cluster network name (Use Get-ClusterNetwork on Windows Server 2012 of higher to find the name)
$IPResourceName = “” 
# the IP Address resource name 
$ILBIP = “” 
# the IP Address of the Internal Load Balancer (ILB) and SQL Cluster
Import-Module FailoverClusters
# If you are using Windows Server 2012 or higher:
Get-ClusterResource $IPResourceName | Set-ClusterParameter -Multiple @{Address=$ILBIP;ProbePort=59999;SubnetMask="255.255.255.255";Network=$ClusterNetworkName;EnableDhcp=0}
# If you are using Windows Server 2008 R2 use this: 
#cluster res $IPResourceName /priv enabledhcp=0 address=$ILBIP probeport=59999  subnetmask=255.255.255.255

This is what the output will look like if run correctly.

You probably notice that the end of that script has a commented line of code to use if you are running on Windows Server 2008 R2. If you are running Windows Server 2008 R2 make sure you run the code specific for Windows Server 2008 R2 at a Command prompt, it is not Powershell.

Next Steps

If you get to this point and you still cannot connect to the cluster remotely you wouldn’t be the first person. There are a lot of things that can go wrong in terms of security, load balancer, SQL ports, etc. I wrote this guide to help troubleshoot connection issues.

In fact, in this very installation I ran into some strange issues in terms of my SQL Server TCP/IP Properties in SQL Server Configuration Manager. When I looked at the properties I did not see the SQL Server Cluster IP address as one of the addresses it was listening on, so I had to add it manually. I’m not sure if that was an anomaly, but it certainly was an issue I had to resolve before I could connect to the cluster from a remote client.

As I mentioned earlier, one other improvement you can make to this installation is to use a DataKeeper Non-Mirrored Volume Resource for TempDB. If you set that up please be aware of the following two configuration issues people commonly run into.

The first issue is if you move tempdb to a folder on the 1st node, you must be sure to create the exact same folder structure on the second node. If you don’t do that when you try to failover SQL Server will fail to come online since it can’t create TempDB

The second issue occurs anytime you add another DataKeeper Volume Resource to a SQL Cluster after the cluster is created. You must go into the properties of the SQL Server cluster resource and make it dependent on the new DataKeeper Volume resource you added. This is true for the TempDB volume and any other volumes you may decide to add after the cluster is created.

If you have any questions about this configuration or any other cluster configurations please feel free to reach out to me on Twitter @DaveBerm.

Troubleshooting #Azure ILB connection issues in a SQL Server AlwaysOn FCI Cluster

TICK TOCK…6 MONTHS UNTIL SQL SERVER 2008/2008 R2 SUPPORT EXPIRES UNLESS YOU TAKE ACTION

January 9, 2019February 14, 2019 daveberm2 Comments

If you are still running SQL Server 2008/2008 R2 you probably have heard by now that as of July 9, 2019, you will no longer be supported. However, realizing that there are still a significant number of customers running on this platform that will not be able to upgrade to a newer version of SQL before that deadline, Microsoft has offered two options to provide extended security updates for an additional three years.

The first option you have requires the annual purchase of “Extended Security Updates”. Extended Security Updates will cost 75% of the full license cost annually and also requires that the customer is on active software assurance, which is typically 25% of the license cost annually. So effectively, to receive Extended Security Updates you are paying for new SQL Server licenses annually for three years, or until you migrate off SQL Server 2008/2008 R2.

However, there is another second option. Microsoft has announced that if you move your SQL Server 2008 R2 instances to Azure, you will receive the Extended Security Updates at no additional charge. There is of course the hourly infrastructure charges you will incur in Azure, plus either the cost of pay as you go SQL Server instances or the Software Assurance charges if you want to bring your existing SQL licenses to Azure, but that cost includes the added benefit of running in a state of the art cloud environment which opens up opportunities for enhanced performance and HA/DR scenarios that you may not have had available on premise.

Azure offers many different options in terms of CPU, Memory and Storage configurations. If you are looking for a server or storage upgrade, or your existing on-premise infrastructure was reaching a refresh cycle, now is the perfect time to dip your feet into the Azure cloud and upgrade your performance and availability at the same time as extending the life of your SQL Server 2008/2008 R2 deployment.

In terms of high availability and disaster recovery configurations, Azure offers up to a 99.99% SLA. To qualify for the SLA you must leveraging their infrastructure appropriately and even then, the SLA only covers “dial tone” to the instance. It is up to you to ensure SQL Server is highly available, which is traditionally done by building a SQL Server Failover Cluster Instance (FCI). Azure has the infrastructure in place which enables you to configure a SQL Server FCI, but due to the lack of cluster aware shared storage in the cloud, you will need to use SIOS DataKeeper to build the FCI. I recently wrote a Step-by-Step guide to help you with the process. Step-by-Step: How to configure a SQL Server 2008 R2 Failover Cluster Instance in Azure

SIOS DataKeeper takes the place of the shared storage normally required by a SQL Server FCI and instead allows you to leverage the any NTFS formatted volumes that are attached to each instance. SIOS keeps the volumes replicated between the instances and presents the storage to the cluster as a resource called a DataKeeper Volume. As far as the cluster is concerned the DataKeeper Volume looks like a share disk, but instead of controlling SCSI reservations (disk locking), it controls the mirror direction ensuring writes occur on the active server and are synchronously or asynchronously replicated to the other cluster nodes. The end user experience is exactly the same as a traditional shared storage cluster, but under the covers the cluster is leveraging the locally attached storage instead of shared storage.

In Azure your cluster nodes can run in different racks (Fault Domains), data centers (Availability Zones), or even in different geographic regions. SIOS DataKeeper supports all three options: Fault Domains, Availability Zones or cross Region replication to cover both HA and DR requirements. Similar configurations are also possible in the AWS and Google Cloud.

azure ha — Typical 2-node SQL Server FCI configuration in Azure with SIOS DataKeeper

With Azure Site Recovery (ASR) you can replicate standalone or clustered instances of SQL Server between Region Pairs, without the headache and expense of managing your own disaster recovery site. And of course SQL Server seldom lives alone, so at the same time you move your SQL Server instance to Azure you probably want to move your application servers there as well to also take advantage of the performance and availability upgrades available in Azure. Combining SIOS DataKeeper for HA and ASR for DR provides a cost effective HA and DR strategy that would have been impossible, or extremely expensive to implement on premise with SAN replication and your own DR site.

asr - 2 — Common configuration leveraging SIOS DataKeeper for HA and Azure Site Recovery for DR

While it only takes a few minutes to spin up a SQL Server instance in Azure, I wouldn’t wait until the last minute to do your migration. Please take the next few months to become familiar with Azure, start doing some testing, and then plan to migrate your workloads well before the July 9, 2019 expiration date. Running SQL Server after that date leaves you susceptible to any new security threats and also puts you out of compliance. Your boss, and more importantly your customers, will be glad to know that their data is still secure, available, and in compliance once you migrate your workload to Azure.

Moving a Google Form Between Google Domains

November 1, 2018November 2, 2018 daveberm3 Comments

If you are anything like me, you might have a few different Google accounts that you work with on a regular basis. I ran into an issue recently where I spent a fair amount of time creating a Google Form, just to realize I did this while logged in with my personal account rather than my work account. I didn’t really want to redo the work I had done, but when I searched to try to find out how to move the form between accounts I didn’t come up with anything that addressed my situation.

It’s not hard to do, but I figured I’d write it down just in case it happens to you. I stumbled upon the fix just by trying a few things. Assuming this is a new form with no data all you have to do is the following:

Add your second Google account as a Collaborator on the form
Log in to your second Google account, open the form and “Make a copy” of the form

G Suite

That’s it, now you have a copy of the form in your second Google account. Of course if you had already collected some data on the first form you would want to copy that Sheet and put it in your second Google account as well and attach the form to that copy of the data. Be sure to delete the old form so you don’t accidentally use the old form.