One DBA's Ongoing Search for Clarity in the Middle of Nowhere


*or*

Yet Another Andy Writing About SQL Server

Thursday, April 2, 2015

Availability Groups - Where Did My Disks Go?



The TL;DR - beware of Failover Cluster Manager trying to steal your non-shared storage!

--
At a client recently two availability groups on a single Windows cluster went down simultaneously.  Apparently the server that was the primary for the AGs (Server1) had mysteriously lost its DATA and LOG drives.  By the time the client got us involved they had faked the application into coming up by pointing it directly to the single SQL Server instance that was still up (Server2) directly via the instance name rather than the availability group listeners.

I found that two of the drives on Server1 had gone offline, causing the issues – sample errors from the Windows System and Application Logs respectively:



--

Log Name:      System
Source:        Microsoft-Windows-FailoverClustering
Date:          3/31/2015 2:36:25 PM
Event ID:      1635
Task Category: Resource Control Manager
Level:         Information
Keywords:     
User:          SYSTEM
Computer:      server1.mydomain.com
Description:
Cluster resource 'Cluster Disk 2' of type 'Physical Disk' in clustered role 'Available Storage' failed.

--

     Log Name:      Application
Source:        MSSQLSERVER
Date:          3/31/2015 2:36:25 PM
Event ID:      9001
Task Category: Server
Level:         Error
Keywords:      Classic
User:          N/A
Computer:      server1.mydomain.com
Description:
The log for database 'Database1' is not available. Check the event log for related error messages. Resolve any errors and restart the database.

--



Since this is an Availability Group (AG) I was surprised that there were “Cluster Disk” resources at all – AG’s do not rely on shared disk (it is one of their many advantages) and most AG clusters don’t have any shared disk at all (occasionally a quorum drive).



This is what I saw in Failover Cluster Manager: