One DBA's Ongoing Search for Clarity in the Middle of Nowhere


Yet Another Andy Writing About SQL Server

Thursday, June 21, 2018

Speaking This Weekend in Iowa City!

Come out to SQL Saturday Iowa at the University of Iowa this Saturday 06/23/2018 for a SQL Saturday that never disappoints - Iowa always puts on a fun event with great atmosphere and informative sessions.

I am giving the premiere of my new presentation "Intro to Powershell for the SQL Server DBA"


Many DBA's have the same outlook on Powershell - we all know we should use it, but we don't think we have time to learn one more tool that isn't even really part of SQL Server anyway.  I was one of these DBA's for many years, but recently I have come to respect the *power* of Powershell.

Come learn the basics of Powershell, how Powershell and SQL Server interact with each other, and finally how to use Powershell to automate common SQL Server tasks such as maintenance and SQL Server installations.  If you need to interact with Windows or Active Directory, odds are Powershell is the right tool for the job - come learn how to use it!


There are also many other amazing speakers and sessions this weekend, including Microsoft Data Platform MVP's David Klee (@kleegeek) and Ed Leighton-Dick (@eleightondick).

Hope to see you there!

Friday, June 15, 2018

Revoke the permission(s) before dropping the server principal

As a remote DBA I often have to log in to client systems for a fixed amount of time and then remove logins and other security after the work is done.

This morning while cleaning up I ended up with a new error for me:

Msg 15173, Level 16, State 1, Line 1
Server principal ‘NtiretyMirror’ has granted one or more permission(s). Revoke the permission(s) before dropping the server principal.


At first I thought the error was that the login had *been* granted a permission, which seemed bizarre to me - every day I drop logins with all types of permissions and role memberships.

When I Google'd the base portion of the error - "Revoke the permission(s) before dropping the server principal" I found that I was incorrect, as described in an MSDB article here.

The problem wasn't that the login had permissions, but rather than it had granted someone *else* permissions.
The next question became how to tell what permissions we are talking about.  Since the permissions weren't on the NtiretyMirror login itself, how could I find out what permissions were involved?

There was a basic query in the MSDB post, but I tweaked it to give a cleaner resultset:

DECLARE @GrantorName nvarchar(4000)

SET @GrantorName = 'NtiretyMirror' /* Login in Question */

SELECT as Grantor
, as Grantee
, a.state_desc as PermissionState
, a.class_desc as PermissionClass
, a.type as PermissionType
, a.permission_name as PermissionName
, a.major_id as SecurableID 
FROM sys.server_permissions a
JOIN sys.server_principals b
ON a.grantor_principal_id = b.principal_id
JOIN sys.server_principals c
ON a.grantee_principal_id = c.principal_id
WHERE grantor_principal_id =
SELECT principal_id
FROM sys.server_principals
WHERE name = @GrantorName


Sure enough I found an offending row:

Grantor Grantee PermissionState PermissionClass PermissionType PermissionName SecurableID
NtiretyMirror public GRANT ENDPOINT CO CONNECT 6


I looked in sys.endpoints and found that ID =6 is the mirroring endpoint.

Surprise, surprise - when I had set up mirroring (as NtiretyMirror - get it?) I had granted CONNECT to public as part of the process, and not the NtiretyMirror login owned that GRANT.

I could try to REVOKE the CONNECT, but did I really want to risk breaking mirroring?

I found the safer thing to do was to change the ownership of the Mirroring endpoint via ALTER AUTHORIZATION:

USE [master]

Once I changed the ownership of the endpoint, my original query showed no permissions related to NtiretyMirror, and I was able to drop it successfully with the normal DROP LOGIN statement.

Hope this helps!

Saturday, June 9, 2018

Double Win- Watch DeWitt and Help SQLSoldier's Loved Ones!

Brent Ozar (blog/@BrentO) is sponsoring a special webcast in a couple weeks, with an amazing presenter:

The webcast (“SQL Query Optimization. Why is it so hard to get right?”) is free, but with a suggested donation to the  Robert Davis Memorial and Grief Fund:

If you don’t know who these people are, you should…

Dr. David DeWitt is currently with MIT but spent years at Microsoft as a Technical Fellow at the Jim Gray Systems Lab.  He presented keynotes at the PASS Summit for years and gives an awesome talk regardless of the topic.

Robert Davis (SQL Soldier) was a SQL Server MVP and Microsoft Certified Master who was helpful to many over the years online in the forums and in-person at events.  He died suddenly and without warning earlier this years, leaving behind his wife.

Sign up for the webcast if you can. and if you feel so moved please give to the fund as well!


Wednesday, June 6, 2018

Speaking at PASS Summit 2018!

I found out a few days ago that one of my sessions was selected for Summit this Fall - I will be presenting an updated version of my "Getting Started with Extended Events" session that I have been working at SQL Saturdays over the last couple years.

I have submitted to Summit multiple times over the years, and every time the email comes in late May/early June my stomach knots up - I have been selected to speak once before (2016) but I still have that little bit of self-doubt.  Were my abstracts good enough?  Does anyone care about the subjects I want to talk about?  Who am I kidding anyway?

But when the email comes and one of the sessions says "Accepted"...

I have to admit for a brief moment it feels like this:

...but the moment passes pretty quickly, because I know that I am but a little piece of the big puzzle.  There are dozens of  Community Speakers and also dozens of Microsoft and Partner Speakers, not to mention the *HUNDREDS* of volunteers that make PASS and the Summit work.

It is still exciting to be selected, and each time it is a little humbling to hear people who don't happen to get selected in any given year offer up congratulations to those who are picked; we all support each other, and it's likely next year I will be congratulating someone else when they get picked and I don't.

Compared to Mr. T, I think in this case Emmet is more on-target:

At the end of the day we are all #sqlfamily and we all  have our own unique talents and our own unique voice - even if a particular event or venue doesn't select you, keep sharing - speaking/writing/whatever - because everyone has a story to tell.

Thursday, May 24, 2018

Toolbox - How Long Has My Server Been Up?
No..not that Up (although it is awesome!)


As a remote service provider a common request is to follow up after a server restart (or investigate a possible server restart).  A question that inevitably comes up is "When did the server restart?"

If all you need to know is when the SQL Server service restarted, the easiest thing is the creation date of tempdb since it is recreated as part of SQL Server start-up:

SELECT name, create_date
FROM master.sys.databases
WHERE name = 'tempdb'

name create_date
tempdb 05/14/2018 07:31:18


The catch is that the SQL service restart often isn't good enough.

When did Windows reboot?

When did the Agent service start?

Why did SQL start before the tempdb create date?

...and more.

I went digging and found an answer I liked online - as usual I modified it for my own wishes while keeping the core of the code:


Item Uptime Query
Author Andy Galbraith
Created 2018/01/03
Updated 2018/03/05
Decription Returns uptime for Windows, SQL Server, and SQL Agent
Versions SQL 2005+
Notes In some cases the SQL Server start time may show earlier than the OS start time - this is due
to the particular flags used to show that the system is "up" and is expected.

Queries modified from query in comment at:

@@SERVERNAME as Instance_Name
, GETDATE() as Current_Server_Time
, CONVERT(VARCHAR(23),b.OS_Start,121) as OS_Start_Time
, z.SQL_Start as SQL_Server_Start_Time
, CONVERT(VARCHAR(23),a.Agent_Start,121) as SQL_Agent_Start_Time
CAST(CAST(right(10000000+datediff(dd,0,GETDATE()-b.OS_Start),4) as INT) as VARCHAR(4))+' days '+
CAST(DATEPART(hh,GETDATE()-b.OS_Start) as VARCHAR) + ' hours, ' +
CAST(DATEPART(mi,GETDATE()-b.OS_Start) as VARCHAR) + ' minutes, ' +
CAST(DATEPART(ss,GETDATE()-b.OS_Start) as VARCHAR) + ' seconds') as OS_Uptime
CAST(CAST(right(10000000+datediff(dd,0,GETDATE()-z.SQL_Start),4) as INT) as VARCHAR(4))+' days '+
CAST(DATEPART(hh,GETDATE()-z.SQL_Start) as VARCHAR) + ' hours, ' +
CAST(DATEPART(mi,GETDATE()-z.SQL_Start) as VARCHAR) + ' minutes, ' +
CAST(DATEPART(ss,GETDATE()-z.SQL_Start) as VARCHAR) + ' seconds') as SQL_Uptime
CAST(CAST(right(10000000+datediff(dd,0,GETDATE()-a.Agent_Start),4) as INT) as VARCHAR(4))+' days '+
CAST(DATEPART(hh,GETDATE()-a.Agent_Start) as VARCHAR) + ' hours, ' +
CAST(DATEPART(mi,GETDATE()-a.Agent_Start) as VARCHAR) + ' minutes, ' +
CAST(DATEPART(ss,GETDATE()-a.Agent_Start) as VARCHAR) + ' seconds') as Agent_Uptime
SELECT login_time as SQL_Start
FROM sys.dm_exec_sessions WHERE session_id = 1
) z
when aa.program_name like 'SQLAgent %'
then aa.login_time
ELSE '99990101'
CONVERT(datetime,'99990101')) as Agent_Start
FROM master.dbo.sysprocesses aa
WHERE aa.login_time > '20000101'
) a
DATEADD(ss,bb.[ms_ticks]/-1000,GETDATE()) as OS_Start
FROM sys.[dm_os_sys_info] bb
) b

Instance_Name Current_Server_Time OS_Start_Time SQL_Server_Start_Time SQL_Agent_Start_Time
INSTANCE999 05/24/2018 19:59:48 05/14/2018 07:30:31 05/14/2018 07:31:14 05/14/2018 07:31:24

OS_Uptime SQL_Uptime Agent_Uptime
10 days 12 hours, 29 minutes, 17 seconds 10 days 12 hours, 28 minutes, 34 seconds 10 days 12 hours, 28 minutes, 23 seconds


As you can see, it leverages sys.dm_exec_sessions and sysprocesses to pull the start times for the SQL services and sys.dm_os_sys_info to pull the OS info.


For another option, we can leverage an operating system function, the systeminfo command line tool,

You can use a pipe | to pass a find command into systeminfo, like this:

systeminfo|find "Time:"

System Boot Time:          4/21/2018, 7:41:10 AM


To run it, you can run an elevated CMD prompt from windows (right-click and Run as Administrator), or if xp_cmdshell is enabled you can run it from SQL:

EXEC xp_cmdshell 'systeminfo|find "Time:"'

So we have seen three ways to pull the info - but which one is the most "right"?

Let's compare the results for concurrent runs of the three queries:


name create_date
tempdb 04/21/2018 07:42:44

Instance_Name OS_Start_Time SQL_Server_Start_Time SQL_Agent_Start_Time
INSTANCE999 04/21/2018 07:42:17 04/21/2018 07:42:31 04/21/2018 07:42:47

System Boot Time:          4/21/2018, 7:41:10 AM


There are times when you need to know pretty specifically what happened and when - so let's review:
  1. systeminfo time - 7:41:10AM
  2. sys.dm_os_sys_info ms_ticks - 7:42:17AM - more than a minute later!
  3. sys.dm_exec_sessions SQL start - 7:41:32AM
  4. tempdb create date - 7:42:44AM
  5. sysprocesses Agent start - 7:42:47AM
Sometimes you only have access to one datetime or another - especially systeminfo - for example you may be a sysadmin in SQL but not have access to run the Windows command line function - but if you have access to all of these numbers, consider the order of items above.


Hope this helps!

Friday, May 18, 2018

When Good Query Plans Go Bad

Today's story starts again with a page escalation:

We are seeing very high CPU usage on DistServer01.  Could you please take a look and do a health check on this server?  Please let us know if there is anything we can do to help.

I connnected to DistServer01 and, sure enough, there was consistently 99% CPU in Windows and 90%+ of it was from SQL Server.

As you may have guessed, DistServer01 was the Distributor in a very large transactional replication topology.

I checked the active processes looking for something that was abnormally long-running - no good.

I checked the processes again looking for something using high CPU.  The top thing I found was a Distribution Agent for one of the many publications being routed through DistServer01.  I stopped the SQL Agent job for the Distribution Agent - no good, so I resumed the job.

Next place to turn was that warm comfy place...the Glenn Berry DMV scripts!

I have heaped praise on Glenn (blog/@GlennAlanBerry) and his scripts before, and they are definitely my go-to when I dive into troubleshooting.

For high CPU an especially useful query is the "top worker time queries" query.  This is the 2008R2 version (but it works on later versions as well):

-- Get top total worker time queries for entire instance (Query 33) (Top Worker Time Queries)
SELECT TOP(50) DB_NAME(t.[dbid]) AS [Database Name], t.[text] AS [Query Text],
qs.total_worker_time AS [Total Worker Time], qs.min_worker_time AS [Min Worker Time],
qs.total_worker_time/qs.execution_count AS [Avg Worker Time],
qs.max_worker_time AS [Max Worker Time], qs.execution_count AS [Execution Count],
qs.total_elapsed_time/qs.execution_count AS [Avg Elapsed Time],
qs.total_logical_reads/qs.execution_count AS [Avg Logical Reads],
qs.total_physical_reads/qs.execution_count AS [Avg Physical Reads], qs.creation_time AS [Creation Time]
, qp.query_plan AS [Query Plan] -- comment out this column if copying results to Excel
FROM sys.dm_exec_query_stats AS qs WITH (NOLOCK)
CROSS APPLY sys.dm_exec_sql_text(plan_handle) AS t
CROSS APPLY sys.dm_exec_query_plan(plan_handle) AS qp
ORDER BY qs.total_worker_time DESC OPTION (RECOMPILE);

The (truncated) results come out like this:

Query Text
CREATE PROCEDURE sp_MSadd_replcmds( @para
05/12/2018 22:55:25
NULL --  -- Name:   -- fn_repldecryptver4  --   -- Descrip 1724660996 0 89 372022 19285577 05/12/2018 22:55:25
NULL   --  -- Name:  --  fn_MSrepl_isdistdb  --  -- Desc 1066486938 0 53 287015 19817134 05/12/2018 22:55:23
NULL create procedure sys.sp_MSispeertopeeragent  ( 845048463 0 43 203009 19276558 05/12/2018 22:55:26
NULL CREATE PROCEDURE sys.sp_MSadd_distribution_h 796880096 0 490 368106 1623313 05/13/2018 03:50:07
NULL CREATE PROCEDURE sys.sp_MSget_subscription_gui 527028755 0 27 379022 19260376 05/12/2018 22:55:27
NULL create procedure sys.sp_MSispeertopeeragent  (    500954674 0 25 240012 19276558 05/12/2018 22:55:26
distribution CREATE PROCEDURE sp_MSdelete_publisherdb_trans       434227603 0 204054 5019298 2128 05/12/2018 23:05:04
master SELECT target_data           FROM sys.dm_xe_session_ 342630497 55006 228116 963058 1502 05/13/2018 00:00:08
msdb CREATE PROCEDURE sp_sqlagent_has_server_access  267846345 30999 763095 21321226 351 05/12/2018 22:55:18

As you can see here, the internal replication process sp_MSadd_replcmds was by far the largest total consumer of CPU, but its average CPU time was still relatively small - it was just called over a million times since the server was last started!

Just to be sure I checked the Missing/Recommended Indexes to make sure there wasn't a glaringly obvious recommendation (sometimes new indexes drastically help CPU) but there was not.

The next place to turn was the query plan...
Query plans can be very complicated subjects.  At the most basic level, query plans are the layout of how a query operates.  It is what we see represented when we look at a graphical execution plan in Management Studio:

Query plans are created on first execution and then are stored in the plan cache on either the first or second execution of the query (depending on whether you have Optimize for Ad Hoc Workloads enabled).

This is beneficial a majority (a vast majority) of the time because generating the query plan takes much more resources that simply looking it up in cache.  The catch is that not all similar queries need the same plan.

The simplest example to me relates to indexing (and is actually one of the reasons my first instinct was to look for missing indexes) - when performing a filtered query (meaning one with a WHERE clause) the amount of data being returned help shape the query plan.  If the estimator expects a large portion of the rows in the table to be returned, it will often ignore nonclustered indexes and simply perform a table scan, checking each row to see if it meets the filter criteria and returning the appropriate rows.  If the estimator expects a more narrow subset and there is a relevant nonclustered index in place, it generates a plan that performs a scan or seek of the nonclustered index and then performs lookups to return any columns not already included in the index.

The problem comes up when the plan that gets cached is built for one of these two cases that is the less frequent of the two.  Using our example above let's say the first run of the query is the large resultset/table scan, which caches that plan.  The next one million runs of the query are the narrow resultset that could have used a nice, small nonclustered index - except there is already a table scan query plan stored in the cache, so SQL Server helpfully does what it's supposed to do any uses that plan, performing a million table scans for our million queries!

The way to deal with a bad query plan is to get rid of it (Simple, right?)  Many people think you need to dump the full procedure and/or system caches (or even worse, restart SQL Server) to clear this situation (and *DON'T* do that in production!)

What you can do instead is zap a single offending query plan, impacting only that specific query!

As Glenn describes here, you can look up the individual plan handle and then pass it to DBCC FREEPROCCACHE to remove that single plan from the cache.

To look up the plan handle, you can use this query:

SELECT cp.plan_handle, cp.objtype, cp.usecounts,
DB_NAME(st.dbid) AS [DatabaseName]
FROM sys.dm_exec_cached_plans AS cp CROSS APPLY sys.dm_exec_sql_text(plan_handle) AS st
WHERE OBJECT_NAME (st.objectid)
sp_MSadd_replcmds%' OPTION (RECOMPILE);

plan_handle objtype usecounts DatabaseName
0x0500FF7FB38EE9D6A03A9D310501000001000000000000000000000000000000000000000000000000000000 Proc 1077642 NULL


The next step is to pass this plan_handle to DBCC FREEPROCCACHE:


DBCC FREEPROCCACHE(0x0500FF7FB38EE9D6A03A9D310501000001000000000000000000000000000000000000000000000000000000)

-- my case - BOOM!  Instantaneous CPU drop from 90%+ to just 5%-10%.

Another useful query came from a blog post Chris Skorlinski of Microsoft

The post is relatively old (2009) but it still is relevant as it talks about troubleshooting replication (which what led me to it, even though my problem ended up not being a replication problem) but the query is useful to find both the plan handles and other relevant information about the top CPU-consuming queries.  It is a useful expansion on the Glenn query above:

(qs.total_logical_reads/qs.execution_count) as avg_logical_reads,
(qs.total_logical_writes/qs.execution_count) as avg_logical_writes,
(qs.total_physical_reads/qs.execution_count) as avg_phys_reads,
FROM sys.dm_exec_query_stats as qs
CROSS APPLY sys.dm_exec_sql_text(qs.sql_handle) as st
CROSS APPLY sys.dm_exec_query_plan(qs.plan_handle) as qp
--WHERE st.text like 'CREATE PROCEDURE sp_MSadd_replcmds%'
ORDER BY qs.total_worker_time DESC

I want to repeat that - even though this is a distribution server in a complicated replication topology, this turned out to *not* be a replication problem.

Always review the evidence - error logs, query results, etc. - and follow the trail - never assume.


Hope this helps!

Wednesday, April 18, 2018

Why Don't I have a Preferred Replica?

The client reported that the log file on their main database was growing unusually large, and when they checked the backup target the database didn't have any LOG backups!

Oh, and the database was part of an Availability Group.

After some investigation we found that while the Availability Group was configured to prefer the primary replica for backups, it wasn't evaluating as the preferred backup via the system function sys.fn_hadr_backup_is_preferred_replica!

What we found in the output logs for the LOG backup (Hallengren) job looked like this:


Availability group: DATABASE1_AG1
Availability group role: PRIMARY
Availability group backup preference: PRIMARY
Is preferred backup replica: No


Even though the preferred backup replica is the primary replica, and we are on the primary replica, the is_preferred check says No.

A test showed that a manual BACKUP LOG statement worked, but both the Hallengren DatabaseBackup *and* a "regular" maintenance plan backup ignored Database1.

To check the preferred replica status we manually ran the command in a query window:


SELECT sys.fn_hadr_backup_is_preferred_replica(‘Database1’)

-- came back as 0 (false) on both nodes – SQL1 and SQL2.

As long as the preferred backup replica is 0, no intelligent backups will run against that replica, which is why backups weren’t happening on SQL1 or SQL2.


Now what?  Back to the old standby:

As described in the solution post, one possible answer is if the internal server names don't match - we checked the servers and sure enough:


SELECT SERVERPROPERTY('ServerName') as [InstanceName]
, SERVERPROPERTY('ComputerNamePhysicalNetBIOS') as [ComputerNamePhysicalNetBIOS]
, @@SERVERNAME as ServerName





The SERVERPROPERTY names reflect the friendly SQL1 and SQL2 names while the @@SERVERNAME reflects the client's standard infrastructure team ###-### names.

This often means that the Windows Server was renamed after SQL was installed but the final step of renaming SQL wasn’t taken (more on that at the end of the email).


Renaming the servers (taking a downtime) was not immediately acceptable so after some consideration we tried manually editing the code of the master.dbo.DatabaseBackup procedure to hard-code the fact that the primary replica is truly the preferred backup replica:

Original code: 
IF @Version >= 11 AND @Cluster IS NOT NULL AND @CurrentAvailabilityGroup IS NOT NULL
SELECT @CurrentIsPreferredBackupReplica = sys.fn_hadr_backup_is_preferred_replica(@CurrentDatabaseName)

Modified code: 
IF @Version >= 11 AND @Cluster IS NOT NULL AND @CurrentAvailabilityGroup IS NOT NULL AND @CurrentAvailabilityGroupRole = 'PRIMARY'
SELECT @CurrentIsPreferredBackupReplica = 1
…and that allowed the backups to run successfully by setting the procedure to true as long as the replica is primary.


One other catch that came up after that also relates directly back to the mismatched names…after making the above change, the LOG backup of Database1 was succeeding but the overall LOG backup job was still failing.  Looking in the job output we found an error:


Msg 50000, Level 16, State 1, Server SQL1, Procedure DatabaseBackup, Line 1041


The author of DatabaseBackup (Ola Hallengren) is ready for the fact that mismatched names can cause issues and has a custom error built into his procedure to throw the Error 50000 if the names don’t match.  The procedure proceeds to run backups if it can, but still ultimately fails overall with the 50000 message.

To mitigate this we further edited the code of DatabaseBackup to comment out the code block that generates this specific error:


  2018-04-18 - Commented out to prevent 50000 error since @@SERVERNAME does not match ServerName

  AndyG, Ntirety

    SET @ErrorMessage = 'The @@SERVERNAME does not match SERVERPROPERTY(''ServerName''). See ' + CASE WHEN SERVERPROPERTY('IsClustered') = 0 THEN '' WHEN SERVERPROPERTY('IsClustered') = 1 THEN '' END + '.' + CHAR(13) + CHAR(10) + ' '
    RAISERROR(@ErrorMessage,16,1) WITH NOWAIT
    SET @Error = @@ERROR


This directly bypasses the error check and prevents the 50000 error for the name mismatch.


As an aside I want to comment on the code editing we were doing - when modifying code of any type consider the source and their licensing/intellectual property rights.  Ola's licensing is very short and sweet:


The SQL Server Maintenance Solution is licensed under the MIT license, a popular and widely used open source license.
Copyright (c) 2018 Ola Hallengren 
Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the “Software”), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions: 
The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software. 


At the end of the day as long as you don't remove his copyright notices you have the right to modify his code.

This is very common with code online - ALWAYS ATTRIBUTE YOUR SOURCES - but if you are looking at any type of commercial code be extremely careful ; not only may you be violating the letter (or spirit) of the law, you may even prevent the code's creator from supporting you in the future!


The edits to DatabaseBackup are allowing the jobs to run successfully for now, but it is truly a bandage rather than a real fix.  The ultimate fix is to rename the SQL Server internally (as described at the link in Ola’s 5000 error

The document title references a “stand-alone instance of SQL Server” but that is what this is since it is two SQL Servers with an availability group rather than a failover cluster (shared) instance.


It is unknown from the information here whether changing the @@SERVERNAME from 123-223 to SQL1 would impact any other business or application processes – this is something the client's teams would need to determine before making any changes to the name.

Also changing the server name would require a SQL downtime (service restart, not Windows reboot) for the name change to take effect (as described in the Microsoft doc) so even if you are comfortable with the name change it needs to be scheduled for a downtime window.


Hope this helps!