One DBA's Ongoing Search for Clarity in the Middle of Nowhere


*or*

Yet Another Andy Writing About SQL Server

Wednesday, April 18, 2018

Why Don't I have a Preferred Replica?


The client reported that the log file on their main database was growing unusually large, and when they checked the backup target the database didn't have any LOG backups!

Oh, and the database was part of an Availability Group.

https://imgflip.com/i/28n8j8
--

After some investigation we found that while the Availability Group was configured to prefer the primary replica for backups, it wasn't evaluating as the preferred backup via the system function sys.fn_hadr_backup_is_preferred_replica!

What we found in the output logs for the LOG backup (Hallengren) job looked like this:

--

Availability group: DATABASE1_AG1
Availability group role: PRIMARY
Availability group backup preference: PRIMARY
Is preferred backup replica: No

--

Even though the preferred backup replica is the primary replica, and we are on the primary replica, the is_preferred check says No.

A test showed that a manual BACKUP LOG statement worked, but both the Hallengren DatabaseBackup *and* a "regular" maintenance plan backup ignored Database1.

To check the preferred replica status we manually ran the command in a query window:

--

SELECT sys.fn_hadr_backup_is_preferred_replica(‘Database1’)

--

...it came back as 0 (false) on both nodes – SQL1 and SQL2.


As long as the preferred backup replica is 0, no intelligent backups will run against that replica, which is why backups weren’t happening on SQL1 or SQL2.

--

Now what?  Back to the old standby:

https://imgflip.com/i/28n92w

As described in the solution post, one possible answer is if the internal server names don't match - we checked the servers and sure enough:

--

SELECT SERVERPROPERTY('ServerName') as [InstanceName]
, SERVERPROPERTY('ComputerNamePhysicalNetBIOS') as [ComputerNamePhysicalNetBIOS]
, @@SERVERNAME as ServerName

--


InstanceName
ComputerNamePhysicalNetBIOS
@@SERVERNAME
SQL1
SQL1
123-223



InstanceName
ComputerNamePhysicalNetBIOS
@@SERVERNAME
SQL2
SQL2
437-326

--

The SERVERPROPERTY names reflect the friendly SQL1 and SQL2 names while the @@SERVERNAME reflects the client's standard infrastructure team ###-### names.

This often means that the Windows Server was renamed after SQL was installed but the final step of renaming SQL wasn’t taken (more on that at the end of the email).

--

Renaming the servers (taking a downtime) was not immediately acceptable so after some consideration we tried manually editing the code of the master.dbo.DatabaseBackup procedure to hard-code the fact that the primary replica is truly the preferred backup replica:

-- 
Original code: 
IF @Version >= 11 AND @Cluster IS NOT NULL AND @CurrentAvailabilityGroup IS NOT NULL
BEGIN
SELECT @CurrentIsPreferredBackupReplica = sys.fn_hadr_backup_is_preferred_replica(@CurrentDatabaseName)
END


Modified code: 
IF @Version >= 11 AND @Cluster IS NOT NULL AND @CurrentAvailabilityGroup IS NOT NULL AND @CurrentAvailabilityGroupRole = 'PRIMARY'
BEGIN
SELECT @CurrentIsPreferredBackupReplica = 1
END
--
               
…and that allowed the backups to run successfully by setting the procedure to true as long as the replica is primary.

http://2.bp.blogspot.com/_s03XW1ZvVvE/S9E3tjNwDMI/AAAAAAAAA6w/2QFwfKb-aNo/s1600/kool-aid.jpg

--

One other catch that came up after that also relates directly back to the mismatched names…after making the above change, the LOG backup of Database1 was succeeding but the overall LOG backup job was still failing.  Looking in the job output we found an error:

--

Msg 50000, Level 16, State 1, Server SQL1, Procedure DatabaseBackup, Line 1041

--

The author of DatabaseBackup (Ola Hallengren) is ready for the fact that mismatched names can cause issues and has a custom error built into his procedure to throw the Error 50000 if the names don’t match.  The procedure proceeds to run backups if it can, but still ultimately fails overall with the 50000 message.

To mitigate this we further edited the code of DatabaseBackup to comment out the code block that generates this specific error:

--

/*
  2018-04-18 - Commented out to prevent 50000 error since @@SERVERNAME does not match ServerName

  AndyG, Ntirety


  IF @@SERVERNAME <> SERVERPROPERTY('ServerName') AND SERVERPROPERTY('IsHadrEnabled') = 1
  BEGIN
    SET @ErrorMessage = 'The @@SERVERNAME does not match SERVERPROPERTY(''ServerName''). See ' + CASE WHEN SERVERPROPERTY('IsClustered') = 0 THEN 'https://docs.microsoft.com/en-us/sql/database-engine/install-windows/rename-a-computer-that-hosts-a-stand-alone-instance-of-sql-server' WHEN SERVERPROPERTY('IsClustered') = 1 THEN 'https://docs.microsoft.com/en-us/sql/sql-server/failover-clusters/install/rename-a-sql-server-failover-cluster-instance' END + '.' + CHAR(13) + CHAR(10) + ' '
    RAISERROR(@ErrorMessage,16,1) WITH NOWAIT
    SET @Error = @@ERROR
  END
*/

--

This directly bypasses the error check and prevents the 50000 error for the name mismatch.

--

As an aside I want to comment on the code editing we were doing - when modifying code of any type consider the source and their licensing/intellectual property rights.  Ola's licensing is very short and sweet:

--

License 
The SQL Server Maintenance Solution is licensed under the MIT license, a popular and widely used open source license.
Copyright (c) 2018 Ola Hallengren 
Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the “Software”), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions: 
The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software. 
THE SOFTWARE IS PROVIDED “AS IS”, WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.
https://ola.hallengren.com/license.html 

--

At the end of the day as long as you don't remove his copyright notices you have the right to modify his code.

This is very common with code online - ALWAYS ATTRIBUTE YOUR SOURCES - but if you are looking at any type of commercial code be extremely careful ; not only may you be violating the letter (or spirit) of the law, you may even prevent the code's creator from supporting you in the future!

--

The edits to DatabaseBackup are allowing the jobs to run successfully for now, but it is truly a bandage rather than a real fix.  The ultimate fix is to rename the SQL Server internally (as described at the link in Ola’s 5000 error https://docs.microsoft.com/en-us/sql/database-engine/install-windows/rename-a-computer-that-hosts-a-stand-alone-instance-of-sql-server)

The document title references a “stand-alone instance of SQL Server” but that is what this is since it is two SQL Servers with an availability group rather than a failover cluster (shared) instance.

*BUT*

It is unknown from the information here whether changing the @@SERVERNAME from 123-223 to SQL1 would impact any other business or application processes – this is something the client's teams would need to determine before making any changes to the name.

Also changing the server name would require a SQL downtime (service restart, not Windows reboot) for the name change to take effect (as described in the Microsoft doc) so even if you are comfortable with the name change it needs to be scheduled for a downtime window.

--

Hope this helps!


2 comments:

  1. Great post, particular liked the images.. :o)
    Anyway just wanted to add that I came across a related issue, where we had a 3 way AG, 2 local DB servers (SQL01/02) and an offsite DR (DRSQL01) server. backup preferences on the AG were set SQL01=100, SQL02=75, DRSQL01=50..

    However after one evening 4 of our databases stopped backing up, there are 47 databases in the AG and all other Dbs were backing up fine. Eventually found this article and tracked down that these 4 databases had actually set their preferred replica to the DR environment that had no backup jobs setup to run.

    It appears that we had a network issue that caused the DB servers to failover (which they have done numerous times in the past without issue). For some reason 43 database correctly failed over and set prefeered replica to SQL02, but these 4 then decided to set their preferred replica to DR.

    I fixed this by failing back from SQL02 to SQL01, then back from SQL01 to SQL02 (wanted to make sure there was no issue in either failover).

    I then used the following script I wrote to list all the sys.fn_hadr_backup_is_preferred_replica values for all databases.

    EXECUTE master.sys.sp_MSforeachdb 'USE [?]; DECLARE @dby VARCHAR(max); SET @dby=DB_NAME(); SELECT @dby;SELECT sys.fn_hadr_backup_is_preferred_replica (@dby);'

    Kind Regards
    Allan

    ReplyDelete