Come out to SQL Saturday Iowa at the University of Iowa this Saturday 06/23/2018 for a SQL Saturday that never disappoints - Iowa always puts on a fun event with great atmosphere and informative sessions.
I am giving the premiere of my new presentation "Intro to Powershell for the SQL Server DBA"
--
Many DBA's have the same outlook on Powershell - we all know we should use it, but we don't think we have time to learn one more tool that isn't even really part of SQL Server anyway. I was one of these DBA's for many years, but recently I have come to respect the *power* of Powershell.
Come learn the basics of Powershell, how Powershell and SQL Server interact with each other, and finally how to use Powershell to automate common SQL Server tasks such as maintenance and SQL Server installations. If you need to interact with Windows or Active Directory, odds are Powershell is the right tool for the job - come learn how to use it!
--
There are also many other amazing speakers and sessions this weekend, including Microsoft Data Platform MVP's David Klee (@kleegeek) and Ed Leighton-Dick (@eleightondick).
Hope to see you there!
One DBA's Ongoing Search for Clarity in the Middle of Nowhere
*or*
Yet Another Andy Writing About SQL Server
Thursday, June 21, 2018
Friday, June 15, 2018
Revoke the permission(s) before dropping the server principal
As a remote DBA I often have to log in to client systems for a fixed amount of time and then remove logins and other security after the work is done.
This morning while cleaning up I ended up with a new error for me:
--
Interesting.
At first I thought the error was that the login had *been* granted a permission, which seemed bizarre to me - every day I drop logins with all types of permissions and role memberships.
When I Google'd the base portion of the error - "Revoke the permission(s) before dropping the server principal" I found that I was incorrect, as described in an MSDB article here.
The problem wasn't that the login had permissions, but rather than it had granted someone *else* permissions.
The next question became how to tell what permissions we are talking about. Since the permissions weren't on the NtiretyMirror login itself, how could I find out what permissions were involved?
There was a basic query in the MSDB post, but I tweaked it to give a cleaner resultset:
--
Sure enough I found an offending row:
--
I looked in sys.endpoints and found that ID =6 is the mirroring endpoint.
Surprise, surprise - when I had set up mirroring (as NtiretyMirror - get it?) I had granted CONNECT to public as part of the process, and not the NtiretyMirror login owned that GRANT.
I could try to REVOKE the CONNECT, but did I really want to risk breaking mirroring?
I found the safer thing to do was to change the ownership of the Mirroring endpoint via ALTER AUTHORIZATION:
--
Once I changed the ownership of the endpoint, my original query showed no permissions related to NtiretyMirror, and I was able to drop it successfully with the normal DROP LOGIN statement.
Hope this helps!
This morning while cleaning up I ended up with a new error for me:
--
Msg 15173, Level 16, State 1, Line 1--
Server principal ‘NtiretyMirror’ has granted one or more permission(s). Revoke the permission(s) before dropping the server principal.
Interesting.
At first I thought the error was that the login had *been* granted a permission, which seemed bizarre to me - every day I drop logins with all types of permissions and role memberships.
When I Google'd the base portion of the error - "Revoke the permission(s) before dropping the server principal" I found that I was incorrect, as described in an MSDB article here.
The problem wasn't that the login had permissions, but rather than it had granted someone *else* permissions.
![]() |
https://i.imgflip.com/qygsn.jpg |
There was a basic query in the MSDB post, but I tweaked it to give a cleaner resultset:
--
DECLARE @GrantorName nvarchar(4000)--
SET @GrantorName = 'NtiretyMirror' /* Login in Question */
SELECT b.name as Grantor
, c.name as Grantee
, a.state_desc as PermissionState
, a.class_desc as PermissionClass
, a.type as PermissionType
, a.permission_name as PermissionName
, a.major_id as SecurableID
FROM sys.server_permissions a
JOIN sys.server_principals b
ON a.grantor_principal_id = b.principal_id
JOIN sys.server_principals c
ON a.grantee_principal_id = c.principal_id
WHERE grantor_principal_id =
(
SELECT principal_id
FROM sys.server_principals
WHERE name = @GrantorName
)
Sure enough I found an offending row:
NtiretyMirror | public | GRANT | ENDPOINT | CO | CONNECT | 6 |
--
I looked in sys.endpoints and found that ID =6 is the mirroring endpoint.
Surprise, surprise - when I had set up mirroring (as NtiretyMirror - get it?) I had granted CONNECT to public as part of the process, and not the NtiretyMirror login owned that GRANT.
I could try to REVOKE the CONNECT, but did I really want to risk breaking mirroring?
I found the safer thing to do was to change the ownership of the Mirroring endpoint via ALTER AUTHORIZATION:
--
USE [master]--
GO
ALTER AUTHORIZATION ON ENDPOINT::mirroring TO sa;
GO
Once I changed the ownership of the endpoint, my original query showed no permissions related to NtiretyMirror, and I was able to drop it successfully with the normal DROP LOGIN statement.
![]() |
https://memegenerator.net/img/instances/66310140.jpg |
Hope this helps!
Saturday, June 9, 2018
Double Win- Watch DeWitt and Help SQLSoldier's Loved Ones!
Brent Ozar (blog/@BrentO) is sponsoring a special webcast in a couple
weeks, with an amazing presenter:

The webcast (“SQL Query Optimization. Why is it so hard to
get right?”) is free, but with a suggested donation to the Robert Davis
Memorial and Grief Fund:

If you don’t know who these people are, you should…
Dr. David
DeWitt is currently with MIT but spent years at Microsoft as a Technical
Fellow at the Jim Gray Systems Lab. He presented keynotes at the PASS
Summit for years and gives an awesome talk regardless of the topic.
Robert Davis (SQL
Soldier) was a SQL Server MVP and Microsoft Certified Master who was
helpful to many over the years online in the forums and in-person at
events. He died suddenly and without warning earlier this years, leaving
behind his wife.
Sign up for the webcast if you can. and if you feel so moved
please give to the fund as well!
Wednesday, June 6, 2018
Speaking at PASS Summit 2018!
I found out a few days ago that one of my sessions was selected for Summit this Fall - I will be presenting an updated version of my "Getting Started with Extended Events" session that I have been working at SQL Saturdays over the last couple years.
I have submitted to Summit multiple times over the years, and every time the email comes in late May/early June my stomach knots up - I have been selected to speak once before (2016) but I still have that little bit of self-doubt. Were my abstracts good enough? Does anyone care about the subjects I want to talk about? Who am I kidding anyway?
But when the email comes and one of the sessions says "Accepted"...
I have to admit for a brief moment it feels like this:
...but the moment passes pretty quickly, because I know that I am but a little piece of the big puzzle. There are dozens of Community Speakers and also dozens of Microsoft and Partner Speakers, not to mention the *HUNDREDS* of volunteers that make PASS and the Summit work.
It is still exciting to be selected, and each time it is a little humbling to hear people who don't happen to get selected in any given year offer up congratulations to those who are picked; we all support each other, and it's likely next year I will be congratulating someone else when they get picked and I don't.
Compared to Mr. T, I think in this case Emmet is more on-target:
At the end of the day we are all #sqlfamily and we all have our own unique talents and our own unique voice - even if a particular event or venue doesn't select you, keep sharing - speaking/writing/whatever - because everyone has a story to tell.
I have submitted to Summit multiple times over the years, and every time the email comes in late May/early June my stomach knots up - I have been selected to speak once before (2016) but I still have that little bit of self-doubt. Were my abstracts good enough? Does anyone care about the subjects I want to talk about? Who am I kidding anyway?
But when the email comes and one of the sessions says "Accepted"...
I have to admit for a brief moment it feels like this:
![]() |
https://theworkingcaregiver.files.wordpress.com/2017/03/you-are-the-chosen-one-pl-ffffff.jpg |
...but the moment passes pretty quickly, because I know that I am but a little piece of the big puzzle. There are dozens of Community Speakers and also dozens of Microsoft and Partner Speakers, not to mention the *HUNDREDS* of volunteers that make PASS and the Summit work.
It is still exciting to be selected, and each time it is a little humbling to hear people who don't happen to get selected in any given year offer up congratulations to those who are picked; we all support each other, and it's likely next year I will be congratulating someone else when they get picked and I don't.
Compared to Mr. T, I think in this case Emmet is more on-target:
At the end of the day we are all #sqlfamily and we all have our own unique talents and our own unique voice - even if a particular event or venue doesn't select you, keep sharing - speaking/writing/whatever - because everyone has a story to tell.
Thursday, May 24, 2018
Toolbox - How Long Has My Server Been Up?
![]() |
https://imgflip.com/i/2axp6y |
--
As a remote service provider a common request is to follow up after a server restart (or investigate a possible server restart). A question that inevitably comes up is "When did the server restart?"
If all you need to know is when the SQL Server service restarted, the easiest thing is the creation date of tempdb since it is recreated as part of SQL Server start-up:
--
SELECT name, create_date
FROM master.sys.databases
WHERE name = 'tempdb'
--
name | create_date |
tempdb | 05/14/2018 07:31:18 |
--
The catch is that the SQL service restart often isn't good enough.
When did Windows reboot?
When did the Agent service start?
Why did SQL start before the tempdb create date?
...and more.
I went digging and found an answer I liked online - as usual I modified it for my own wishes while keeping the core of the code:
--
When did Windows reboot?
When did the Agent service start?
Why did SQL start before the tempdb create date?
...and more.
I went digging and found an answer I liked online - as usual I modified it for my own wishes while keeping the core of the code:
--
/*
Item Uptime Query
Author Andy Galbraith
Created 2018/01/03
Updated 2018/03/05
Decription Returns uptime for Windows, SQL Server, and SQL Agent
Versions SQL 2005+
Notes In some cases the SQL Server start time may show earlier than the OS start time - this is due
to the particular flags used to show that the system is "up" and is expected.
*/
/*
Queries modified from query in comment at:
https://www.sqlservercentral.com/Forums/Topic1384287-391-1.aspx
*/
SELECT
@@SERVERNAME as Instance_Name
, GETDATE() as Current_Server_Time
, CONVERT(VARCHAR(23),b.OS_Start,121) as OS_Start_Time
, z.SQL_Start as SQL_Server_Start_Time
, CONVERT(VARCHAR(23),a.Agent_Start,121) as SQL_Agent_Start_Time
, CONVERT(VARCHAR(50),
CAST(CAST(right(10000000+datediff(dd,0,GETDATE()-b.OS_Start),4) as INT) as VARCHAR(4))+' days '+
CAST(DATEPART(hh,GETDATE()-b.OS_Start) as VARCHAR) + ' hours, ' +
CAST(DATEPART(mi,GETDATE()-b.OS_Start) as VARCHAR) + ' minutes, ' +
CAST(DATEPART(ss,GETDATE()-b.OS_Start) as VARCHAR) + ' seconds') as OS_Uptime
, CONVERT(VARCHAR(50),
CAST(CAST(right(10000000+datediff(dd,0,GETDATE()-z.SQL_Start),4) as INT) as VARCHAR(4))+' days '+
CAST(DATEPART(hh,GETDATE()-z.SQL_Start) as VARCHAR) + ' hours, ' +
CAST(DATEPART(mi,GETDATE()-z.SQL_Start) as VARCHAR) + ' minutes, ' +
CAST(DATEPART(ss,GETDATE()-z.SQL_Start) as VARCHAR) + ' seconds') as SQL_Uptime
, CONVERT(VARCHAR(50),
CAST(CAST(right(10000000+datediff(dd,0,GETDATE()-a.Agent_Start),4) as INT) as VARCHAR(4))+' days '+
CAST(DATEPART(hh,GETDATE()-a.Agent_Start) as VARCHAR) + ' hours, ' +
CAST(DATEPART(mi,GETDATE()-a.Agent_Start) as VARCHAR) + ' minutes, ' +
CAST(DATEPART(ss,GETDATE()-a.Agent_Start) as VARCHAR) + ' seconds') as Agent_Uptime
FROM
(
SELECT login_time as SQL_Start
FROM sys.dm_exec_sessions WHERE session_id = 1
) z
CROSS JOIN
(
SELECT
NULLIF(min(
case
when aa.program_name like 'SQLAgent %'
then aa.login_time
ELSE '99990101'
end),
CONVERT(datetime,'99990101')) as Agent_Start
FROM master.dbo.sysprocesses aa
WHERE aa.login_time > '20000101'
) a
CROSS JOIN
(
SELECT
DATEADD(ss,bb.[ms_ticks]/-1000,GETDATE()) as OS_Start
FROM sys.[dm_os_sys_info] bb
) b
--
Instance_Name | Current_Server_Time | OS_Start_Time | SQL_Server_Start_Time | SQL_Agent_Start_Time |
INSTANCE999 | 05/24/2018 19:59:48 | 05/14/2018 07:30:31 | 05/14/2018 07:31:14 | 05/14/2018 07:31:24 |
OS_Uptime | SQL_Uptime | Agent_Uptime |
10 days 12 hours, 29 minutes, 17 seconds | 10 days 12 hours, 28 minutes, 34 seconds | 10 days 12 hours, 28 minutes, 23 seconds |
--
As you can see, it leverages sys.dm_exec_sessions and sysprocesses to pull the start times for the SQL services and sys.dm_os_sys_info to pull the OS info.
--
For another option, we can leverage an operating system function, the systeminfo command line tool,
You can use a pipe | to pass a find command into systeminfo, like this:
--
systeminfo|find "Time:"--
output |
System Boot Time: 4/21/2018, 7:41:10 AM |
--
To run it, you can run an elevated CMD prompt from windows (right-click and Run as Administrator), or if xp_cmdshell is enabled you can run it from SQL:
--
EXEC xp_cmdshell 'systeminfo|find "Time:"'--
So we have seen three ways to pull the info - but which one is the most "right"?
Let's compare the results for concurrent runs of the three queries:
--
name | create_date |
tempdb | 04/21/2018 07:42:44 |
Instance_Name | OS_Start_Time | SQL_Server_Start_Time | SQL_Agent_Start_Time |
INSTANCE999 | 04/21/2018 07:42:17 | 04/21/2018 07:42:31 | 04/21/2018 07:42:47 |
output |
System Boot Time: 4/21/2018, 7:41:10 AM |
--
There are times when you need to know pretty specifically what happened and when - so let's review:
There are times when you need to know pretty specifically what happened and when - so let's review:
- systeminfo time - 7:41:10AM
- sys.dm_os_sys_info ms_ticks - 7:42:17AM - more than a minute later!
- sys.dm_exec_sessions SQL start - 7:41:32AM
- tempdb create date - 7:42:44AM
- sysprocesses Agent start - 7:42:47AM
Sometimes you only have access to one datetime or another - especially systeminfo - for example you may be a sysadmin in SQL but not have access to run the Windows command line function - but if you have access to all of these numbers, consider the order of items above.
--
Hope this helps!
--
Hope this helps!
Labels:
DMVs,
Scripts,
SQL Server,
T-SQL,
Toolbox,
Troubleshooting
Friday, May 18, 2018
When Good Query Plans Go Bad
Today's story starts again with a page escalation:
--
I connnected to DistServer01 and, sure enough, there was consistently 99% CPU in Windows and 90%+ of it was from SQL Server.
As you may have guessed, DistServer01 was the Distributor in a very large transactional replication topology.
I checked the active processes looking for something that was abnormally long-running - no good.
I checked the processes again looking for something using high CPU. The top thing I found was a Distribution Agent for one of the many publications being routed through DistServer01. I stopped the SQL Agent job for the Distribution Agent - no good, so I resumed the job.
Next place to turn was that warm comfy place...the Glenn Berry DMV scripts!
I have heaped praise on Glenn (blog/@GlennAlanBerry) and his scripts before, and they are definitely my go-to when I dive into troubleshooting.
For high CPU an especially useful query is the "top worker time queries" query. This is the 2008R2 version (but it works on later versions as well):
--
The (truncated) results come out like this:
As you can see here, the internal replication process sp_MSadd_replcmds was by far the largest total consumer of CPU, but its average CPU time was still relatively small - it was just called over a million times since the server was last started!
Just to be sure I checked the Missing/Recommended Indexes to make sure there wasn't a glaringly obvious recommendation (sometimes new indexes drastically help CPU) but there was not.
The next place to turn was the query plan...
Query plans can be very complicated subjects. At the most basic level, query plans are the layout of how a query operates. It is what we see represented when we look at a graphical execution plan in Management Studio:
Query plans are created on first execution and then are stored in the plan cache on either the first or second execution of the query (depending on whether you have Optimize for Ad Hoc Workloads enabled).
This is beneficial a majority (a vast majority) of the time because generating the query plan takes much more resources that simply looking it up in cache. The catch is that not all similar queries need the same plan.
The simplest example to me relates to indexing (and is actually one of the reasons my first instinct was to look for missing indexes) - when performing a filtered query (meaning one with a WHERE clause) the amount of data being returned help shape the query plan. If the estimator expects a large portion of the rows in the table to be returned, it will often ignore nonclustered indexes and simply perform a table scan, checking each row to see if it meets the filter criteria and returning the appropriate rows. If the estimator expects a more narrow subset and there is a relevant nonclustered index in place, it generates a plan that performs a scan or seek of the nonclustered index and then performs lookups to return any columns not already included in the index.
The problem comes up when the plan that gets cached is built for one of these two cases that is the less frequent of the two. Using our example above let's say the first run of the query is the large resultset/table scan, which caches that plan. The next one million runs of the query are the narrow resultset that could have used a nice, small nonclustered index - except there is already a table scan query plan stored in the cache, so SQL Server helpfully does what it's supposed to do any uses that plan, performing a million table scans for our million queries!
The way to deal with a bad query plan is to get rid of it (Simple, right?) Many people think you need to dump the full procedure and/or system caches (or even worse, restart SQL Server) to clear this situation (and *DON'T* do that in production!)
What you can do instead is zap a single offending query plan, impacting only that specific query!
As Glenn describes here, you can look up the individual plan handle and then pass it to DBCC FREEPROCCACHE to remove that single plan from the cache.
To look up the plan handle, you can use this query:
Another useful query came from a blog post Chris Skorlinski of Microsoft https://blogs.msdn.microsoft.com/chrissk/2009/05/25/transactional-replication-conversations/
The post is relatively old (2009) but it still is relevant as it talks about troubleshooting replication (which what led me to it, even though my problem ended up not being a replication problem) but the query is useful to find both the plan handles and other relevant information about the top CPU-consuming queries. It is a useful expansion on the Glenn query above:
--
--
We are seeing very high CPU usage on DistServer01. Could you please take a look and do a health check on this server? Please let us know if there is anything we can do to help.--
I connnected to DistServer01 and, sure enough, there was consistently 99% CPU in Windows and 90%+ of it was from SQL Server.
![]() |
https://i.imgflip.com/1mqb58.jpg |
As you may have guessed, DistServer01 was the Distributor in a very large transactional replication topology.
I checked the active processes looking for something that was abnormally long-running - no good.
I checked the processes again looking for something using high CPU. The top thing I found was a Distribution Agent for one of the many publications being routed through DistServer01. I stopped the SQL Agent job for the Distribution Agent - no good, so I resumed the job.
Next place to turn was that warm comfy place...the Glenn Berry DMV scripts!
![]() |
http://s.quickmeme.com/img/50/50aefd57e50f2014dda55437e3637e9c9efd9c6b2f84a1ad7f0fa3937129794e.jpg |
I have heaped praise on Glenn (blog/@GlennAlanBerry) and his scripts before, and they are definitely my go-to when I dive into troubleshooting.
For high CPU an especially useful query is the "top worker time queries" query. This is the 2008R2 version (but it works on later versions as well):
--
-- Get top total worker time queries for entire instance (Query 33) (Top Worker Time Queries)--
SELECT TOP(50) DB_NAME(t.[dbid]) AS [Database Name], t.[text] AS [Query Text],
qs.total_worker_time AS [Total Worker Time], qs.min_worker_time AS [Min Worker Time],
qs.total_worker_time/qs.execution_count AS [Avg Worker Time],
qs.max_worker_time AS [Max Worker Time], qs.execution_count AS [Execution Count],
qs.total_elapsed_time/qs.execution_count AS [Avg Elapsed Time],
qs.total_logical_reads/qs.execution_count AS [Avg Logical Reads],
qs.total_physical_reads/qs.execution_count AS [Avg Physical Reads], qs.creation_time AS [Creation Time]
, qp.query_plan AS [Query Plan] -- comment out this column if copying results to Excel
FROM sys.dm_exec_query_stats AS qs WITH (NOLOCK)
CROSS APPLY sys.dm_exec_sql_text(plan_handle) AS t
CROSS APPLY sys.dm_exec_query_plan(plan_handle) AS qp
ORDER BY qs.total_worker_time DESC OPTION (RECOMPILE);
The (truncated) results come out like this:
Database
Name |
Query Text
|
Total
Worker Time |
Min
Worker Time |
Avg
Worker Time |
Max
Worker Time |
Execution
Count |
Creation
Time |
NULL
|
CREATE PROCEDURE
sp_MSadd_replcmds( @para
|
4224360546
|
0
|
3920
|
475776
|
1077642
|
05/12/2018 22:55:25
|
NULL | -- -- Name: -- fn_repldecryptver4 -- -- Descrip | 1724660996 | 0 | 89 | 372022 | 19285577 | 05/12/2018 22:55:25 |
NULL | -- -- Name: -- fn_MSrepl_isdistdb -- -- Desc | 1066486938 | 0 | 53 | 287015 | 19817134 | 05/12/2018 22:55:23 |
NULL | create procedure sys.sp_MSispeertopeeragent ( | 845048463 | 0 | 43 | 203009 | 19276558 | 05/12/2018 22:55:26 |
NULL | CREATE PROCEDURE sys.sp_MSadd_distribution_h | 796880096 | 0 | 490 | 368106 | 1623313 | 05/13/2018 03:50:07 |
NULL | CREATE PROCEDURE sys.sp_MSget_subscription_gui | 527028755 | 0 | 27 | 379022 | 19260376 | 05/12/2018 22:55:27 |
NULL | create procedure sys.sp_MSispeertopeeragent ( | 500954674 | 0 | 25 | 240012 | 19276558 | 05/12/2018 22:55:26 |
distribution | CREATE PROCEDURE sp_MSdelete_publisherdb_trans | 434227603 | 0 | 204054 | 5019298 | 2128 | 05/12/2018 23:05:04 |
master | SELECT target_data FROM sys.dm_xe_session_ | 342630497 | 55006 | 228116 | 963058 | 1502 | 05/13/2018 00:00:08 |
msdb | CREATE PROCEDURE sp_sqlagent_has_server_access | 267846345 | 30999 | 763095 | 21321226 | 351 | 05/12/2018 22:55:18 |
As you can see here, the internal replication process sp_MSadd_replcmds was by far the largest total consumer of CPU, but its average CPU time was still relatively small - it was just called over a million times since the server was last started!
Just to be sure I checked the Missing/Recommended Indexes to make sure there wasn't a glaringly obvious recommendation (sometimes new indexes drastically help CPU) but there was not.
The next place to turn was the query plan...
https://imgflip.com/i/2ajq6z |
Query plans are created on first execution and then are stored in the plan cache on either the first or second execution of the query (depending on whether you have Optimize for Ad Hoc Workloads enabled).
This is beneficial a majority (a vast majority) of the time because generating the query plan takes much more resources that simply looking it up in cache. The catch is that not all similar queries need the same plan.
The simplest example to me relates to indexing (and is actually one of the reasons my first instinct was to look for missing indexes) - when performing a filtered query (meaning one with a WHERE clause) the amount of data being returned help shape the query plan. If the estimator expects a large portion of the rows in the table to be returned, it will often ignore nonclustered indexes and simply perform a table scan, checking each row to see if it meets the filter criteria and returning the appropriate rows. If the estimator expects a more narrow subset and there is a relevant nonclustered index in place, it generates a plan that performs a scan or seek of the nonclustered index and then performs lookups to return any columns not already included in the index.
The problem comes up when the plan that gets cached is built for one of these two cases that is the less frequent of the two. Using our example above let's say the first run of the query is the large resultset/table scan, which caches that plan. The next one million runs of the query are the narrow resultset that could have used a nice, small nonclustered index - except there is already a table scan query plan stored in the cache, so SQL Server helpfully does what it's supposed to do any uses that plan, performing a million table scans for our million queries!
![]() |
http://cdn-webimages.wimages.net/05123d70d127ca613706fc58c0084f07caa772-wm.jpg?v=3 |
The way to deal with a bad query plan is to get rid of it (Simple, right?) Many people think you need to dump the full procedure and/or system caches (or even worse, restart SQL Server) to clear this situation (and *DON'T* do that in production!)
What you can do instead is zap a single offending query plan, impacting only that specific query!
As Glenn describes here, you can look up the individual plan handle and then pass it to DBCC FREEPROCCACHE to remove that single plan from the cache.
To look up the plan handle, you can use this query:
--
SELECT cp.plan_handle, cp.objtype, cp.usecounts,
DB_NAME(st.dbid) AS [DatabaseName]
FROM sys.dm_exec_cached_plans AS cp CROSS APPLY sys.dm_exec_sql_text(plan_handle) AS st
WHERE OBJECT_NAME (st.objectid)
LIKE N'%sp_MSadd_replcmds%' OPTION (RECOMPILE);
--
plan_handle | objtype | usecounts | DatabaseName |
0x0500FF7FB38EE9D6A03A9D310501000001000000000000000000000000000000000000000000000000000000 | Proc | 1077642 | NULL |
--
The next step is to pass this plan_handle to DBCC FREEPROCCACHE:
--
DBCC FREEPROCCACHE(0x0500FF7FB38EE9D6A03A9D310501000001000000000000000000000000000000000000000000000000000000)
The next step is to pass this plan_handle to DBCC FREEPROCCACHE:
--
DBCC FREEPROCCACHE(0x0500FF7FB38EE9D6A03A9D310501000001000000000000000000000000000000000000000000000000000000)
-- |
...in my case - BOOM! Instantaneous CPU drop from 90%+ to just 5%-10%.
![]() |
https://revdrbrian.files.wordpress.com/2016/03/and-there-was-much-rejoicing.jpg |
Another useful query came from a blog post Chris Skorlinski of Microsoft https://blogs.msdn.microsoft.com/chrissk/2009/05/25/transactional-replication-conversations/
The post is relatively old (2009) but it still is relevant as it talks about troubleshooting replication (which what led me to it, even though my problem ended up not being a replication problem) but the query is useful to find both the plan handles and other relevant information about the top CPU-consuming queries. It is a useful expansion on the Glenn query above:
--
SELECT TOP 25
st.text,
qp.query_plan,
(qs.total_logical_reads/qs.execution_count) as avg_logical_reads,
(qs.total_logical_writes/qs.execution_count) as avg_logical_writes,
(qs.total_physical_reads/qs.execution_count) as avg_phys_reads,
qs.*
FROM sys.dm_exec_query_stats as qs
CROSS APPLY sys.dm_exec_sql_text(qs.sql_handle) as st
CROSS APPLY sys.dm_exec_query_plan(qs.plan_handle) as qp
--WHERE st.text like 'CREATE PROCEDURE sp_MSadd_replcmds%'
ORDER BY qs.total_worker_time DESC
--
I want to repeat that - even though this is a distribution server in a complicated replication topology, this turned out to *not* be a replication problem.
Always review the evidence - error logs, query results, etc. - and follow the trail - never assume.
--
Hope this helps!
Wednesday, April 18, 2018
Why Don't I have a Preferred Replica?
![]() |
https://imgflip.com/i/28n8j8 |
--
After some investigation we found that while the Availability Group was configured to prefer the primary replica for backups, it wasn't evaluating as the preferred backup via the system function sys.fn_hadr_backup_is_preferred_replica!
What we found in the output logs for the LOG backup (Hallengren) job looked like this:
--
Availability group: DATABASE1_AG1
Availability group role: PRIMARY
Availability group backup preference: PRIMARY
Is preferred backup replica: No
--
Even though the preferred backup replica is the primary replica, and we are on the primary replica, the is_preferred check says No.
A test showed that a manual BACKUP LOG statement worked, but both the Hallengren DatabaseBackup *and* a "regular" maintenance plan backup ignored Database1.
To check the preferred replica status we manually ran the command in a query window:
--
SELECT sys.fn_hadr_backup_is_preferred_replica(‘Database1’)
--
...it came back as 0 (false) on both nodes – SQL1 and SQL2.
As long as the preferred backup replica is 0, no intelligent backups will run against that replica, which is why backups weren’t happening on SQL1 or SQL2.
--
Now what? Back to the old standby:
![]() |
https://imgflip.com/i/28n92w |
Looking for info on "sys.fn_hadr_backup_is_preferred_replica returning 0" led me to https://social.msdn.microsoft.com/Forums/sqlserver/en-US/7cb08c5c-30b7-40d1-97e7-3302675a9d0e/sysfnhadrbackupispreferredreplica-returns-0-on-both-nodes-of-cluster?forum=sqldatabaseengine
As described in the solution post, one possible answer is if the internal server names don't match - we checked the servers and sure enough:
--
SELECT SERVERPROPERTY('ServerName') as [InstanceName]
, SERVERPROPERTY('ComputerNamePhysicalNetBIOS') as
[ComputerNamePhysicalNetBIOS]
, @@SERVERNAME as ServerName
--
InstanceName
|
ComputerNamePhysicalNetBIOS
|
@@SERVERNAME
|
SQL1
|
SQL1
|
123-223
|
InstanceName
|
ComputerNamePhysicalNetBIOS
|
@@SERVERNAME
|
SQL2
|
SQL2
|
437-326
|
--
The SERVERPROPERTY
names reflect the friendly SQL1 and SQL2 names while the @@SERVERNAME reflects
the client's standard infrastructure team ###-### names.
This often means
that the Windows Server was renamed after SQL was installed but the final step of
renaming SQL wasn’t taken (more on that at the end of the email).
--
Renaming the servers (taking a downtime) was not immediately acceptable so after some
consideration we tried manually editing the code of the
master.dbo.DatabaseBackup procedure to hard-code the fact that the primary
replica is truly the preferred backup replica:
--
Original code:
IF @Version >= 11 AND @Cluster IS NOT NULL AND @CurrentAvailabilityGroup IS NOT NULL
BEGIN
SELECT @CurrentIsPreferredBackupReplica = sys.fn_hadr_backup_is_preferred_replica(@CurrentDatabaseName)
END
Modified code:
IF @Version >= 11 AND @Cluster IS NOT NULL AND @CurrentAvailabilityGroup IS NOT NULL AND @CurrentAvailabilityGroupRole = 'PRIMARY'
BEGIN
SELECT @CurrentIsPreferredBackupReplica = 1
END
--
…and that allowed the backups to run successfully by setting the procedure to true as long as the
replica is primary.
--
One other catch
that came up after that also relates directly back to the mismatched
names…after making the above change, the LOG backup of Database1 was succeeding but
the overall LOG backup job was still failing. Looking in the job output
we found an error:
--
Msg 50000, Level 16, State 1, Server SQL1,
Procedure DatabaseBackup, Line 1041
The @@SERVERNAME does not match SERVERPROPERTY('ServerName').
See https://docs.microsoft.com/en-us/sql/database-engine/install-windows/rename-a-computer-that-hosts-a-stand-alone-instance-of-sql-server.
--
The author of
DatabaseBackup (Ola Hallengren) is ready for the fact that mismatched names can
cause issues and has a custom error built into his procedure to throw the Error
50000 if the names don’t match. The procedure proceeds to run backups if
it can, but still ultimately fails overall with the 50000 message.
To mitigate this we
further edited the code of DatabaseBackup to comment out the code block that
generates this specific error:
--
/*
2018-04-18 - Commented out to prevent 50000
error since @@SERVERNAME does not match ServerName
AndyG, Ntirety
IF @@SERVERNAME <> SERVERPROPERTY('ServerName')
AND SERVERPROPERTY('IsHadrEnabled') = 1
BEGIN
SET @ErrorMessage = 'The
@@SERVERNAME does not match SERVERPROPERTY(''ServerName''). See ' + CASE WHEN
SERVERPROPERTY('IsClustered') = 0 THEN 'https://docs.microsoft.com/en-us/sql/database-engine/install-windows/rename-a-computer-that-hosts-a-stand-alone-instance-of-sql-server'
WHEN SERVERPROPERTY('IsClustered') = 1 THEN
'https://docs.microsoft.com/en-us/sql/sql-server/failover-clusters/install/rename-a-sql-server-failover-cluster-instance'
END + '.' + CHAR(13) + CHAR(10) + ' '
RAISERROR(@ErrorMessage,16,1)
WITH NOWAIT
SET @Error = @@ERROR
END
*/
--
This directly
bypasses the error check and prevents the 50000 error for the name mismatch.
--
As an aside I want to comment on the code editing we were doing - when modifying code of any type consider the source and their licensing/intellectual property rights. Ola's licensing is very short and sweet:
--
License
The SQL Server Maintenance Solution is licensed under the MIT license, a popular and widely used open source license.
Copyright (c) 2018 Ola Hallengren
Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the “Software”), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions:
The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software.
THE SOFTWARE IS PROVIDED “AS IS”, WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.
https://ola.hallengren.com/license.html
--
This is very common with code online - ALWAYS ATTRIBUTE YOUR SOURCES - but if you are looking at any type of commercial code be extremely careful ; not only may you be violating the letter (or spirit) of the law, you may even prevent the code's creator from supporting you in the future!
The edits to DatabaseBackup are allowing
the jobs to run successfully for now, but it is truly a bandage rather than a
real fix. The ultimate fix is to rename the SQL Server internally (as
described at the link in Ola’s 5000 error https://docs.microsoft.com/en-us/sql/database-engine/install-windows/rename-a-computer-that-hosts-a-stand-alone-instance-of-sql-server)
The document title
references a “stand-alone instance of SQL Server” but that is what this is
since it is two SQL Servers with an availability group rather than a failover
cluster (shared) instance.
*BUT*
It is unknown from the information here whether changing the @@SERVERNAME from 123-223 to SQL1 would impact any
other business or application processes – this is something the client's teams would need to determine before making any changes to the name.
Also changing the server name would require a SQL downtime (service restart, not Windows reboot) for the name
change to take effect (as described in the Microsoft doc) so even if you are
comfortable with the name change it needs to be scheduled for a downtime
window.
--
Hope this helps!
Subscribe to:
Posts (Atom)