One DBA's Ongoing Search for Clarity in the Middle of Nowhere


Yet Another Andy Writing About SQL Server

Friday, April 22, 2016

Where Is My Primary Replica Again?

We have many clients with multi-node Availability Groups - that is, AGs with more than two replicas.  One of the problems I have always had with Availability Group management via the GUI (admit it, you use the GUI sometimes all you non-PowerShell geeks) is the fact that most of the work needs to be done from the primary replica.  You can connect to the Availability Group manager on a secondary replica, but you can only see limited data about that particular secondary replica and can't see much about the other replicas, including *which* replica is the current primary replica!

To perform management you almost always need to connect to the primary replica, but how can I figure out which one is primary without just connecting to the instances one by one until I get lucky?
Enter the T-SQL:
SELECT AS AvailabilityGroupName
, HAGS.primary_replica AS PrimaryReplicaName
, HARS.role_desc as LocalReplicaRoleDesc
, DRCS.database_name AS DatabaseName
, HDRS.synchronization_state_desc as SynchronizationStateDesc
, HDRS.is_suspended AS IsSuspended
, DRCS.is_database_joined AS IsJoined
FROM master.sys.availability_groups AS AG
LEFT OUTER JOIN master.sys.dm_hadr_availability_group_states as HAGS
ON AG.group_id = HAGS.group_id
INNER JOIN master.sys.availability_replicas AS AR
ON AG.group_id = AR.group_id
INNER JOIN master.sys.dm_hadr_availability_replica_states AS HARS
ON AR.replica_id = HARS.replica_id AND HARS.is_local = 1
INNER JOIN master.sys.dm_hadr_database_replica_cluster_states AS DRCS
ON HARS.replica_id = DRCS.replica_id
LEFT OUTER JOIN master.sys.dm_hadr_database_replica_states AS HDRS
ON DRCS.replica_id = HDRS.replica_id
AND DRCS.group_database_id = HDRS.group_database_id
ORDER BY, DRCS.database_name
This query can be run on any of the replicas, and it will return information about the Availability Groups and their member databases, *including* the name of the primary replica instance!
Hope this helps!

Thursday, April 14, 2016

How Long Did That Job Run?

When you use the msdb tables to try to gather information on your SQL Server Agent jobs, inevitably you will end up in a situation like this:
SELECT TOP 10 as JobName
,sjh.run_date as RunDate
,sjh.run_time as RunTime
,sjh.run_duration as Duration
from msdb.dbo.sysjobhistory sjh
join msdb.dbo.sysjobs sj
on sjh.job_id = sj.job_id
where step_id=0
JobName RunDate RunTime Duration
ServerA_RESTORE_FROM_PROD_V2 20140904 222821 4131
ServerA_RESTORE_FROM_PROD_V2 20140904 231032 3055
ServerA_RESTORE_FROM_PROD_V2 20140904 231130 5003
Bloomberg_Pkg 20140904 231233 205
ServerA_RESTORE_FROM_PROD_V2 20140904 231350 2343
DatabaseIntegrityCheck - SYSTEM_DATABASES 20140923 1500 433
DatabaseBackup - SYSTEM_DATABASES - FULL 20140923 10000 23
syspolicy_purge_history 20140923 20000 36
DatabaseBackup - USER_DATABASES - FULL 20140923 40000 2333
DatabaseIntegrityCheck - SYSTEM_DATABASES 20140924 1500 113
So....when did "DatabaseIntegrityCheck - SYSTEM_DATABASES" start? At 1500 - is that 3pm?  You may be able hash out that this translates to 12:15am local time...but what if you want to perform datetime-style math on the RunDate/RunTime?  Sure you can do multiple leaps to say (RunDate>X and RunDate<=Y) AND (RunTime>A and RunTime<=B), but you then need to explicitly format your X, Y, A, and B in the appropriate integer-style format.  Wouldn't it be easier to just be able to datetime math?

The next part is even worse - quick - how long did the first instance of "ServerA_RESTORE_FROM_PROD_V2" run?

4,131 somethings (seconds, ms, etc), right?
In truth, the job ran forty-one minutes and thirty-one seconds.
Yes, yes I am.  No precision beyond seconds, and no quick way to do math.  if you want to figure out how much longer instance 1 of the job ran than instance 2, you have to do some serious goofiness to figure it out.

Here is the fix for both of these problems!


The first item is easy although not directly obvious without a little help from that old standby:
(Yes I am that old...)

Microsoft included a system scalar function msdb.dbo.agent_datetime(run_date, run_time) that turns the combination  on run_date and run_time into a datetime:
SELECT TOP 10 as JobName
,sjh.run_date as RunDate
,sjh.run_time as RunTime
,msdb.dbo.agent_datetime(sjh.run_date,sjh.run_time) as RunDateTime
from msdb.dbo.sysjobhistory sjh
join msdb.dbo.sysjobs sj
on sjh.job_id = sj.job_id
where step_id=0
09/04/2014 22:28:21
09/04/2014 23:10:32
09/04/2014 23:11:30
09/04/2014 23:12:33
09/04/2014 23:13:50
DatabaseIntegrityCheck - SYSTEM_DATABASES201409231500
09/23/2014 00:15:00
DatabaseBackup - SYSTEM_DATABASES - FULL2014092310000
09/23/2014 01:00:00
09/23/2014 02:00:00
DatabaseBackup - USER_DATABASES - FULL2014092340000
09/23/2014 04:00:00
DatabaseIntegrityCheck - SYSTEM_DATABASES201409241500
09/24/2014 00:15:00
I agree with the author of this post who calls the agent_datetime() function "undocumented" since there wasn't a record of it in Books Online - I checked around and couldn't find any standard documentation of it on MSDN or TechNet.

Now that we have a datetime, we can perform all of the regular datetime manipulation functions such as DATEDIFF() on the values.


The second part is a little more obnoxious - there isn't a quick Microsoft function (documented or otherwise) to make the run_duration into a process-ready value.

To hash the run_duration into a useful value I wrote a CASE statement some time ago:
SELECT TOP 10 as JobName
,CASE len(sjh.run_duration)
WHEN 1 THEN cast('00:00:0'
+ cast(sjh.run_duration as char) as char (8))
WHEN 2 THEN cast('00:00:'
+ cast(sjh.run_duration as char) as char (8))
WHEN 3 THEN cast('00:0'
+ Left(right(sjh.run_duration,3),1)
+':' + right(sjh.run_duration,2) as char (8))
WHEN 4 THEN cast('00:'
+ Left(right(sjh.run_duration,4),2)
+':' + right(sjh.run_duration,2) as char (8))
WHEN 5 THEN cast('0'
+ Left(right(sjh.run_duration,5),1)
+':' + Left(right(sjh.run_duration,4),2)
+':' + right(sjh.run_duration,2) as char (8))
WHEN 6 THEN cast(Left(right(sjh.run_duration,6),2)
+':' + Left(right(sjh.run_duration,4),2)
+':' + right(sjh.run_duration,2) as char (8))
END as 'Duration'

from msdb.dbo.sysjobhistory sjh
join msdb.dbo.sysjobs sj
on sjh.job_id = sj.job_id
where step_id=0
JobName Duration run_duration
ServerA_RESTORE_FROM_PROD_V2 00:41:31 4131
ServerA_RESTORE_FROM_PROD_V2 00:00:03 3
ServerA_RESTORE_FROM_PROD_V2 00:00:05 5
Bloomberg_Pkg 00:00:02 2
ServerA_RESTORE_FROM_PROD_V2 00:00:02 2
DatabaseIntegrityCheck - SYSTEM_DATABASES 00:00:43 43
DatabaseBackup - SYSTEM_DATABASES - FULL 00:00:02 2
syspolicy_purge_history 00:00:36 36
DatabaseBackup - USER_DATABASES - FULL 00:00:02 2
DatabaseIntegrityCheck - SYSTEM_DATABASES 00:00:11 11

Interestingly I recently discovered a more elegant solution (while looking for a answer to a different problem) that utilizes the STUFF() function.  Look at this forum post on SQLServerCentral.  The fourth item down from "Mudluck" is almost exactly the same as my CASE statement above, but look at the reply below it from "JG-324908":
SELECT stuff(stuff(replace(str(run_duration,6,0),' ','0'),3,0,':'),6,0,':') FROM sysJobHist
Sure enough, running this STUFF() code results in the same output as the much more complicated CASE statement above.  JG notes that he/she found it in a forum post somewhere and I dug around a little to see if I could find the original author without any luck :(


As with many other bits of code, I keep these things in a big NotePad file of useful code snippets (some people like OneNote instead, and there are dozens of choices - use the one you normally prefer) so that I can quickly reference them when needed.  I always note the website or forum post where I found the code if I didnt create it myself as well as the original author.  This lets me give credit where credit is due when showing the code to others, but it also gives me an idea of someone to approach in the future if I have a question about a similar topic.

It was especially interesting to find the STUFF() code because STUFF() isn't a function I often use, and in this case it was perfect.


Hope this helps!

Monday, April 4, 2016

Don't Use Delimiters In Your Object Names!

It started with a failed backup status check:
Msg 50000, Level 16, State 1, Line 73
** Backups listed in this file are missing / out of date **
db                status   recovery model   last full backup   Full BU hours old
[FinanceDB]       ONLINE   SIMPLE           NULL             NULL
Check out the database name - [FinanceDB],  At first my response was something like:
I went and looked on the server, and sure enough in Management Studio I saw one database named "FinanceDB" and a database named "[FinanceDB]".

This was on a SQL 2008R2 instance, but as a test I created a database named [test] on my local SQL 2014 instance and sure enough it worked!

The source of the problem at the client was the LiteSpeed maintenance plan.  Even though the backup task was set to backup all user databases, it wasn't picking up the square-bracketed database.

On my test system I set up a "regular" SQL Server maintenance plan, and it had did not have the same problem - the backup job did pick up the square-bracketed [test] database.

I next ran the Ola Hallengren DatabaseBackup that I already had installed, and it also picked up the rogue database, creating a backup named TestServer_[test]_FULL_20160404_143721.bak.


Since I don't have LiteSpeed on my test system I created a quick test on the client's test system  (SQL 2008R2) - I created a new database named [test] and created a new LiteSpeed maintenance plan to back up *just* that database - the job failed with this error:
Executed as user: Domain\User. Microsoft (R) SQL Server Execute Package Utility  Version 11.0.5058.0 for 64-bit  Copyright (C) Microsoft Corporation. All rights reserved.    Started:  3:47:35 PM  Progress: 2016-04-04 15:47:35.81     Source: {D8AD0CC9-710A-4C59-A8E6-1B9228562535}      Executing query "DECLARE @Guid UNIQUEIDENTIFIER EXECUTE msdb..sp_ma...".: 100% complete  End Progress  Error: 2016-04-04 15:47:37.01     Code: 0xC002F210     Source: Fast Compression Backup 1 Execute SQL Task     Description: Executing the query "execute master..xp_slssqlmaint N'-BkUpMedia DISK -..." failed with the following error: "LiteSpeed? for SQL Server?  ? 2015 Dell Inc.    Database 'test' is invalid: not found    Msg 62401, Level 16, State 1, Line 0: Database 'test' is invalid: not found    ".   End Error  DTExec: The package execution returned DTSER_FAILURE (1).  Started:  3:47:35 PM  Finished: 3:47:37 PM  Elapsed:  1.809 seconds.  The package execution failed.  The step failed.
That was strange - remember the original backup job succeeded (although not backing up the square-bracketed database).  I considered and realized in the original situation there were two databases - one with brackets and one without.  Sure enough when I created a database just named "test" and included it in my new maintenance plan, the job succeeded.

How did it succeed?  It backed up "test" TWICE:

The LiteSpeed Fast Compression backup backed up the "test" database twice (Fast Compression runs a FULL every so often and DIFF's in-between - this is why two backups in a row results in a FULL and a DIFF).  I ran the job again and saw the same thing:

I verified by creating an object in [test] and running the backups again and running a restore - what I was backing up was truly "test" over and over.


This is not about bashing Dell/Quest LiteSpeed - it is about pointing out a shortcoming in how SQL Server allows us to name objects and a cautionary tale on that naming.

Although it worked in most of the test scenarios above, you can see how the square-bracketed database name failed under one maintenance scenario - and not only did it fail, it failed in a particularly nefarious way because the backup job *didn't* fail - it just didn't pick up the database.


I re-created my [test] database on my local instance and wanted to see what the syntax would look like for a DROP - I went through the SCRIPT action in Management Studio and ended up with this:

USE [master]

/****** Object:  Database [[test]]    Script Date: 4/4/2016 2:00:55 PM ******/

Note the *three* square brackets at the end of the DROP statement – two open brackets but three close brackets.

A syntax checker parse showed statement OK – when I removed one of the closing brackets, so that I had two open and two close, the parse showed me this:

Msg 105, Level 15, State 1, Line 5
Unclosed quotation mark after the character string '[test]

Msg 102, Level 15, State 1, Line 5
Incorrect syntax near '[test]

…and an execution resulted in the same errors.  When I re-added the third close square bracket, my DROP statement succceeded.

A script of a simple BACKUP statement resulted in the same thing:

TO DISK = N'C:\Program Files\Microsoft SQL Server\MSSQL12.MSSQLSERVER\MSSQL\Backup\[test].bak' WITH NOFORMAT, NOINIT,  NAME = N'[test]-Full Database Backup', SKIP, NOREWIND, NOUNLOAD,  STATS = 10


This third bracket may be the source of the issue in LiteSpeed and it may not - but it shows the importance of handling object names properly and how touchy the parser can be when you don't make it happy.


At the end of the day - just don't do it.  There is always going to be some other name that is just as useful as any name that includes a delimiter, and you don't need to handle the bizarre and inconsistent scenarios that will arise.

Hope this helps!