I get that kind of call or email from customers all the time. I have a generic response as follows…
This has everything you need to know.https://docs.microsoft.com/en-us/sql/sql-server/failover-clusters/windows/sql-server-multi-subnet-clustering-sql-server?view=sql-server-2017They don’t go into great detail about what to do if your connection does not support multisubnetfailover=true. If your connection does NOT support that parameter, then set registerallprovidersip to false and cleanup DNS. That procedure is described best here.
I figure I get this question often enough I probably should just flesh out my response a bit, hence the reason for this post.
In general people just aren’t aware of how multi-subnet failover clusters work. Multi-subnet failover clustering support was added in Windows Server 2012 with the addition of the “OR” technology when defining cluster resource dependencies. This allowed people to allow a Cluster Name resource to be dependent upon IP Address x.x.x.x OR IP Address y.y.y.y.
x.x.x.x would be an a cluster IP resource valid in Subnet A and y.y.y.y would be a cluster IP address valid in Subnet B. Only one address will be online at any given time, whichever address was valid for the subnet the resource was currently running on.
Microsoft SQL Server started supporting this concept starting with SQL Server 2012 with both failover cluster instances (FCI) using 3-party SANless clustering solutions like SIOS DataKeeper and SQL Server Always On Availability Groups.
By default if you create a SQL Server multi-subnet failover cluster the cluster should be automatically configured optimally, including setting up the two IP addresses, adding two A records to DNS and setting the registerallprovidersIP to true. However, on the client end you need to tell it that you are connecting to a multi-subnet failover cluster, otherwise the connection won’t be made.
Configuring the client
Configuring the client is done by adding multisubnetfailover=true to the connection string. This Microsoft documentation is a great resource, but if you just search for multisubnetfailover=true you will find a lot of information about that setting.
However, not every application will support adding that to the connection string. If you find yourself in that situation you should ask your application vendor to add support for that or show you how to do it.
However, all is not lost if you find yourself in that situation. You will want to change the behavior of the cluster so that upon failover DNS is update so that the single A record associated with the cluster client access point is updated with the new IP address. This is in lieu of having two A records in DNS, one with each cluster IP address, which is the default behavior in an multi-subnet cluster.
This article reference SharePoint, you can ignore that, the rest of the article is pretty well written to describe the process you should follow.
The highlights of that article are as follows…
Get-ClusterResource “[Network Name]” | Set-ClusterParameter RegisterAllProvidersIP 0After restarting the cluster-name-object (basically restarting the role) & cleaning up all “A” records manually (clean-up isn’t done automatically) we can see our old A-records are still in DNS so we’ll need to delete those manually.
In addition to those steps I’d advise you to reduce the TTL on the HostRecordTTL as described in this article.
The highlight of that article is as follows.
PS C:\> Get-ClusterResource -Name cluster1FS | Set-ClusterParameter -Name HostRecordTTL -Value 300
With a Value of 300 you could potentially be waiting up to 5 minutes for your clients to reconnect after a failover, or even longer if if have a large Active Directory infrastructure and AD replication takes some time to update all the DNS servers across your infrastructure.
You are going to want to figure out what the optimal TTL is to facilitate quick client reconnections without over burdening your DNS servers with a bunch of DNS Lookup requests.
This type of configuration is common in disaster recovery configurations where your DR site is in a different subnet. It is also very common in HA deployments in AWS because different Availability Zones are in different subnets.
Let me know if you have any questions. You can always reach me on Twitter @daveberm