https://imgflip.com/i/28n8j8 |
--
After some investigation we found that while the Availability Group was configured to prefer the primary replica for backups, it wasn't evaluating as the preferred backup via the system function sys.fn_hadr_backup_is_preferred_replica!
What we found in the output logs for the LOG backup (Hallengren) job looked like this:
--
Availability group: DATABASE1_AG1
Availability group role: PRIMARY
Availability group backup preference: PRIMARY
Is preferred backup replica: No
--
Even though the preferred backup replica is the primary replica, and we are on the primary replica, the is_preferred check says No.
A test showed that a manual BACKUP LOG statement worked, but both the Hallengren DatabaseBackup *and* a "regular" maintenance plan backup ignored Database1.
To check the preferred replica status we manually ran the command in a query window:
--
SELECT sys.fn_hadr_backup_is_preferred_replica(‘Database1’)
--
...it came back as 0 (false) on both nodes – SQL1 and SQL2.
As long as the preferred backup replica is 0, no intelligent backups will run against that replica, which is why backups weren’t happening on SQL1 or SQL2.
--
Now what? Back to the old standby:
https://imgflip.com/i/28n92w |
Looking for info on "sys.fn_hadr_backup_is_preferred_replica returning 0" led me to https://social.msdn.microsoft.com/Forums/sqlserver/en-US/7cb08c5c-30b7-40d1-97e7-3302675a9d0e/sysfnhadrbackupispreferredreplica-returns-0-on-both-nodes-of-cluster?forum=sqldatabaseengine
As described in the solution post, one possible answer is if the internal server names don't match - we checked the servers and sure enough:
--
SELECT SERVERPROPERTY('ServerName') as [InstanceName]
, SERVERPROPERTY('ComputerNamePhysicalNetBIOS') as
[ComputerNamePhysicalNetBIOS]
, @@SERVERNAME as ServerName
--
InstanceName
|
ComputerNamePhysicalNetBIOS
|
@@SERVERNAME
|
SQL1
|
SQL1
|
123-223
|
InstanceName
|
ComputerNamePhysicalNetBIOS
|
@@SERVERNAME
|
SQL2
|
SQL2
|
437-326
|
--
The SERVERPROPERTY
names reflect the friendly SQL1 and SQL2 names while the @@SERVERNAME reflects
the client's standard infrastructure team ###-### names.
This often means
that the Windows Server was renamed after SQL was installed but the final step of
renaming SQL wasn’t taken (more on that at the end of the email).
--
Renaming the servers (taking a downtime) was not immediately acceptable so after some
consideration we tried manually editing the code of the
master.dbo.DatabaseBackup procedure to hard-code the fact that the primary
replica is truly the preferred backup replica:
--
Original code:
IF @Version >= 11 AND @Cluster IS NOT NULL AND @CurrentAvailabilityGroup IS NOT NULL
BEGIN
SELECT @CurrentIsPreferredBackupReplica = sys.fn_hadr_backup_is_preferred_replica(@CurrentDatabaseName)
END
Modified code:
IF @Version >= 11 AND @Cluster IS NOT NULL AND @CurrentAvailabilityGroup IS NOT NULL AND @CurrentAvailabilityGroupRole = 'PRIMARY'
BEGIN
SELECT @CurrentIsPreferredBackupReplica = 1
END
--
…and that allowed the backups to run successfully by setting the procedure to true as long as the
replica is primary.
--
One other catch
that came up after that also relates directly back to the mismatched
names…after making the above change, the LOG backup of Database1 was succeeding but
the overall LOG backup job was still failing. Looking in the job output
we found an error:
--
Msg 50000, Level 16, State 1, Server SQL1,
Procedure DatabaseBackup, Line 1041
The @@SERVERNAME does not match SERVERPROPERTY('ServerName').
See https://docs.microsoft.com/en-us/sql/database-engine/install-windows/rename-a-computer-that-hosts-a-stand-alone-instance-of-sql-server.
--
The author of
DatabaseBackup (Ola Hallengren) is ready for the fact that mismatched names can
cause issues and has a custom error built into his procedure to throw the Error
50000 if the names don’t match. The procedure proceeds to run backups if
it can, but still ultimately fails overall with the 50000 message.
To mitigate this we
further edited the code of DatabaseBackup to comment out the code block that
generates this specific error:
--
/*
2018-04-18 - Commented out to prevent 50000
error since @@SERVERNAME does not match ServerName
AndyG, Ntirety
IF @@SERVERNAME <> SERVERPROPERTY('ServerName')
AND SERVERPROPERTY('IsHadrEnabled') = 1
BEGIN
SET @ErrorMessage = 'The
@@SERVERNAME does not match SERVERPROPERTY(''ServerName''). See ' + CASE WHEN
SERVERPROPERTY('IsClustered') = 0 THEN 'https://docs.microsoft.com/en-us/sql/database-engine/install-windows/rename-a-computer-that-hosts-a-stand-alone-instance-of-sql-server'
WHEN SERVERPROPERTY('IsClustered') = 1 THEN
'https://docs.microsoft.com/en-us/sql/sql-server/failover-clusters/install/rename-a-sql-server-failover-cluster-instance'
END + '.' + CHAR(13) + CHAR(10) + ' '
RAISERROR(@ErrorMessage,16,1)
WITH NOWAIT
SET @Error = @@ERROR
END
*/
--
This directly
bypasses the error check and prevents the 50000 error for the name mismatch.
--
As an aside I want to comment on the code editing we were doing - when modifying code of any type consider the source and their licensing/intellectual property rights. Ola's licensing is very short and sweet:
--
License
The SQL Server Maintenance Solution is licensed under the MIT license, a popular and widely used open source license.
Copyright (c) 2018 Ola Hallengren
Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the “Software”), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions:
The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software.
THE SOFTWARE IS PROVIDED “AS IS”, WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.
https://ola.hallengren.com/license.html
--
This is very common with code online - ALWAYS ATTRIBUTE YOUR SOURCES - but if you are looking at any type of commercial code be extremely careful ; not only may you be violating the letter (or spirit) of the law, you may even prevent the code's creator from supporting you in the future!
The edits to DatabaseBackup are allowing
the jobs to run successfully for now, but it is truly a bandage rather than a
real fix. The ultimate fix is to rename the SQL Server internally (as
described at the link in Ola’s 5000 error https://docs.microsoft.com/en-us/sql/database-engine/install-windows/rename-a-computer-that-hosts-a-stand-alone-instance-of-sql-server)
The document title
references a “stand-alone instance of SQL Server” but that is what this is
since it is two SQL Servers with an availability group rather than a failover
cluster (shared) instance.
*BUT*
It is unknown from the information here whether changing the @@SERVERNAME from 123-223 to SQL1 would impact any
other business or application processes – this is something the client's teams would need to determine before making any changes to the name.
Also changing the server name would require a SQL downtime (service restart, not Windows reboot) for the name
change to take effect (as described in the Microsoft doc) so even if you are
comfortable with the name change it needs to be scheduled for a downtime
window.
--
Hope this helps!
Great post, particular liked the images.. :o)
ReplyDeleteAnyway just wanted to add that I came across a related issue, where we had a 3 way AG, 2 local DB servers (SQL01/02) and an offsite DR (DRSQL01) server. backup preferences on the AG were set SQL01=100, SQL02=75, DRSQL01=50..
However after one evening 4 of our databases stopped backing up, there are 47 databases in the AG and all other Dbs were backing up fine. Eventually found this article and tracked down that these 4 databases had actually set their preferred replica to the DR environment that had no backup jobs setup to run.
It appears that we had a network issue that caused the DB servers to failover (which they have done numerous times in the past without issue). For some reason 43 database correctly failed over and set prefeered replica to SQL02, but these 4 then decided to set their preferred replica to DR.
I fixed this by failing back from SQL02 to SQL01, then back from SQL01 to SQL02 (wanted to make sure there was no issue in either failover).
I then used the following script I wrote to list all the sys.fn_hadr_backup_is_preferred_replica values for all databases.
EXECUTE master.sys.sp_MSforeachdb 'USE [?]; DECLARE @dby VARCHAR(max); SET @dby=DB_NAME(); SELECT @dby;SELECT sys.fn_hadr_backup_is_preferred_replica (@dby);'
Kind Regards
Allan
Thanks Allan!
Delete