One DBA's Ongoing Search for Clarity in the Middle of Nowhere


*or*

Yet Another Andy Writing About SQL Server

Tuesday, January 9, 2018

T-SQL Tuesday #98 – Take Small Bites!

https://media.makeameme.org/created/mega-bytes-well.jpg
It's T-SQL Tuesday time again - the monthly blog party was started by Adam Machanic (blog/@AdamMachanic) and each month someone different selects a new topic.  This month's cycle is hosted by Arun Sirpal (blog/@blobeater1) and his chosen topic is "Your Technical Challenges Conquered" considering technical challenges and what we do to resolve them.

This was kind of a hard one for me because a lot of my blog posts are already about technical challenges from work, which means I have already written about many things that would have qualified for this category - I did find one thing from a few months ago that I hadn't documented yet...

--

As a services/production DBA I frequently get technical challenges related to issues about disk space, which usually tie back to issues about file growth, which usually tie back to issues about code, which usually tie back to issues about people (not necessarily developers - and yes, I do believe developers are people...usually...)
https://i0.wp.com/www.adamtheautomator.com/wp-content/uploads/2015/04/Worked-Fine-In-Dev-Ops-Problem-Now.jpg
--

This story is yet another tale of woe starting with a page at 1am for a filling drive, in this case the LOG drive.

I went to my default trace query (described here) and found...a few...records for growths:

EventName DatabaseName FileName StartTime ApplicationName HostName LoginName
Log File Auto Grow App01 App01_log 9/26/2017 1:08 .Net SqlClient Data Provider HostServer01 App01admin
Log File Auto Grow App01 App01_log 9/26/2017 1:08 .Net SqlClient Data Provider HostServer01 App01admin
Log File Auto Grow App01 App01_log 9/26/2017 1:08 .Net SqlClient Data Provider HostServer01 App01admin
Log File Auto Grow App01 App01_log 9/26/2017 1:08 .Net SqlClient Data Provider HostServer01 App01admin
Log File Auto Grow App01 App01_log 9/26/2017 1:08 .Net SqlClient Data Provider HostServer01 App01admin
Log File Auto Grow App01 App01_log 9/26/2017 1:07 .Net SqlClient Data Provider HostServer01 App01admin
Log File Auto Grow App01 App01_log 9/26/2017 1:07 .Net SqlClient Data Provider HostServer01 App01admin
Log File Auto Grow App01 App01_log 9/26/2017 1:07 .Net SqlClient Data Provider HostServer01 App01admin
Log File Auto Grow App01 App01_log 9/26/2017 1:07 .Net SqlClient Data Provider HostServer01 App01admin
Log File Auto Grow App01 App01_log 9/26/2017 1:07 .Net SqlClient Data Provider HostServer01 App01admin
Log File Auto Grow App01 App01_log 9/26/2017 1:06 .Net SqlClient Data Provider HostServer01 App01admin
Log File Auto Grow App01 App01_log 9/26/2017 1:05 SQLCMD DBSQL NT SERVICE\SQLSERVERAGENT
Log File Auto Grow App01 App01_log 9/26/2017 1:04 .Net SqlClient Data Provider HostServer01 App01admin
Log File Auto Grow App01 App01_log 9/26/2017 1:00 .Net SqlClient Data Provider HostServer01 App01admin
Log File Auto Grow App01 App01_log 9/26/2017 0:59 .Net SqlClient Data Provider HostServer01 App01admin
Log File Auto Grow App01 App01_log 9/26/2017 0:59 .Net SqlClient Data Provider HostServer01 App01admin
Log File Auto Grow App01 App01_log 9/26/2017 0:59 .Net SqlClient Data Provider HostServer01 App01admin
Log File Auto Grow App01 App01_log 9/26/2017 0:59 .Net SqlClient Data Provider HostServer01 App01admin
Log File Auto Grow App01 App01_log 9/26/2017 0:59 .Net SqlClient Data Provider HostServer01 App01admin
Log File Auto Grow App01 App01_log 9/26/2017 0:59 .Net SqlClient Data Provider HostServer01 App01admin
Log File Auto Grow App01 App01_log 9/26/2017 0:58 .Net SqlClient Data Provider HostServer01 App01admin
Log File Auto Grow App01 App01_log 9/26/2017 0:58 .Net SqlClient Data Provider HostServer01 App01admin
Log File Auto Grow App01 App01_log 9/26/2017 0:58 .Net SqlClient Data Provider HostServer01 App01admin
Log File Auto Grow App01 App01_log 9/26/2017 0:58 .Net SqlClient Data Provider HostServer01 App01admin
Log File Auto Grow App01 App01_log 9/26/2017 0:58 .Net SqlClient Data Provider HostServer01 App01admin
Log File Auto Grow App01 App01_log 9/26/2017 0:57 .Net SqlClient Data Provider HostServer01 App01admin
Log File Auto Grow App01 App01_log 9/26/2017 0:57 .Net SqlClient Data Provider HostServer01 App01admin
Log File Auto Grow App01 App01_log 9/26/2017 0:57 .Net SqlClient Data Provider HostServer01 App01admin
Log File Auto Grow App01 App01_log 9/26/2017 0:57 .Net SqlClient Data Provider HostServer01 App01admin
Log File Auto Grow App01 App01_log 9/26/2017 0:57 .Net SqlClient Data Provider HostServer01 App01admin
Log File Auto Grow App01 App01_log 9/26/2017 0:56 .Net SqlClient Data Provider HostServer01 App01admin
Log File Auto Grow App01 App01_log 9/26/2017 0:56 .Net SqlClient Data Provider HostServer01 App01admin
Log File Auto Grow App01 App01_log 9/26/2017 0:56 .Net SqlClient Data Provider HostServer01 App01admin
Log File Auto Grow App01 App01_log 9/26/2017 0:56 .Net SqlClient Data Provider HostServer01 App01admin
Log File Auto Grow App01 App01_log 9/26/2017 0:56 .Net SqlClient Data Provider HostServer01 App01admin
Log File Auto Grow App01 App01_log 9/26/2017 0:56 .Net SqlClient Data Provider HostServer01 App01admin
Log File Auto Grow App01 App01_log 9/26/2017 0:56 .Net SqlClient Data Provider HostServer01 App01admin
Log File Auto Grow App01 App01_log 9/26/2017 0:55 .Net SqlClient Data Provider HostServer01 App01admin
Log File Auto Grow App01 App01_log 9/26/2017 0:55 .Net SqlClient Data Provider HostServer01 App01admin
Log File Auto Grow App01 App01_log 9/26/2017 0:54 .Net SqlClient Data Provider HostServer01 App01admin
Log File Auto Grow App01 App01_log 9/26/2017 0:54 .Net SqlClient Data Provider HostServer01 App01admin
Log File Auto Grow App01 App01_log 9/26/2017 0:53 .Net SqlClient Data Provider HostServer01 App01admin
Log File Auto Grow App01 App01_log 9/26/2017 0:52 .Net SqlClient Data Provider HostServer01 App01admin
Log File Auto Grow App01 App01_log 9/26/2017 0:52 .Net SqlClient Data Provider HostServer01 App01admin
Log File Auto Grow App01 App01_log 9/26/2017 0:51 .Net SqlClient Data Provider HostServer01 App01admin
Log File Auto Grow App01 App01_log 9/26/2017 0:51 .Net SqlClient Data Provider HostServer01 App01admin
Log File Auto Grow App01 App01_log 9/26/2017 0:50 .Net SqlClient Data Provider HostServer01 App01admin
Log File Auto Grow App01 App01_log 9/26/2017 0:49 .Net SqlClient Data Provider HostServer01 App01admin
Log File Auto Grow App01 App01_log 9/26/2017 0:49 .Net SqlClient Data Provider HostServer01 App01admin
Log File Auto Grow App01 App01_log 9/26/2017 0:48 .Net SqlClient Data Provider HostServer01 App01admin
Log File Auto Grow App01 App01_log 9/26/2017 0:48 .Net SqlClient Data Provider HostServer01 App01admin
Log File Auto Grow App01 App01_log 9/26/2017 0:47 .Net SqlClient Data Provider HostServer01 App01admin
Log File Auto Grow App01 App01_log 9/26/2017 0:47 .Net SqlClient Data Provider HostServer01 App01admin
Log File Auto Grow App01 App01_log 9/26/2017 0:46 .Net SqlClient Data Provider HostServer01 App01admin
Log File Auto Grow App01 App01_log 9/26/2017 0:45 .Net SqlClient Data Provider HostServer01 App01admin
Log File Auto Grow App01 App01_log 9/26/2017 0:45 .Net SqlClient Data Provider HostServer01 App01admin
Log File Auto Grow App01 App01_log 9/26/2017 0:44 .Net SqlClient Data Provider HostServer01 App01admin
Log File Auto Grow App01 App01_log 9/26/2017 0:44 .Net SqlClient Data Provider HostServer01 App01admin
Log File Auto Grow App01 App01_log 9/26/2017 0:44 .Net SqlClient Data Provider HostServer01 App01admin
Log File Auto Grow App01 App01_log 9/26/2017 0:43 .Net SqlClient Data Provider HostServer01 App01admin
Log File Auto Grow App01 App01_log 9/26/2017 0:43 .Net SqlClient Data Provider HostServer01 App01admin
Log File Auto Grow App01 App01_log 9/26/2017 0:43 .Net SqlClient Data Provider HostServer01 App01admin
Log File Auto Grow App01 App01_log 9/26/2017 0:43 .Net SqlClient Data Provider HostServer01 App01admin
Log File Auto Grow App01 App01_log 9/26/2017 0:42 .Net SqlClient Data Provider HostServer01 App01admin
Log File Auto Grow App01 App01_log 9/26/2017 0:42 .Net SqlClient Data Provider HostServer01 App01admin
Log File Auto Grow App01 App01_log 9/26/2017 0:42 .Net SqlClient Data Provider HostServer01 App01admin
Log File Auto Grow App01 App01_log 9/26/2017 0:42 .Net SqlClient Data Provider HostServer01 App01admin
Log File Auto Grow App01 App01_log 9/26/2017 0:42 .Net SqlClient Data Provider HostServer01 App01admin
Log File Auto Grow App01 App01_log 9/26/2017 0:41 .Net SqlClient Data Provider HostServer01 App01admin
Log File Auto Grow App01 App01_log 9/26/2017 0:41 .Net SqlClient Data Provider HostServer01 App01admin
Log File Auto Grow App01 App01_log 9/26/2017 0:41 .Net SqlClient Data Provider HostServer01 App01admin
Log File Auto Grow App01 App01_log 9/26/2017 0:41 .Net SqlClient Data Provider HostServer01 App01admin
Log File Auto Grow App01 App01_log 9/26/2017 0:40 .Net SqlClient Data Provider HostServer01 App01admin
Log File Auto Grow App01 App01_log 9/26/2017 0:40 .Net SqlClient Data Provider HostServer01 App01admin
Log File Auto Grow App01 App01_log 9/26/2017 0:40 .Net SqlClient Data Provider HostServer01 App01admin
Log File Auto Grow App01 App01_log 9/26/2017 0:40 .Net SqlClient Data Provider HostServer01 App01admin
Log File Auto Grow App01 App01_log 9/26/2017 0:39 .Net SqlClient Data Provider HostServer01 App01admin
Log File Auto Grow App01 App01_log 9/26/2017 0:39 .Net SqlClient Data Provider HostServer01 App01admin
Log File Auto Grow App01 App01_log 9/26/2017 0:39 .Net SqlClient Data Provider HostServer01 App01admin
Log File Auto Grow App01 App01_log 9/26/2017 0:38 .Net SqlClient Data Provider HostServer01 App01admin
Log File Auto Grow App01 App01_log 9/26/2017 0:38 .Net SqlClient Data Provider HostServer01 App01admin
Log File Auto Grow App01 App01_log 9/26/2017 0:38 .Net SqlClient Data Provider HostServer01 App01admin
Log File Auto Grow App01 App01_log 9/26/2017 0:38 .Net SqlClient Data Provider HostServer01 App01admin
Log File Auto Grow App01 App01_log 9/26/2017 0:38 .Net SqlClient Data Provider HostServer01 App01admin
Log File Auto Grow App01 App01_log 9/26/2017 0:37 .Net SqlClient Data Provider HostServer01 App01admin
Log File Auto Grow App01 App01_log 9/26/2017 0:37 .Net SqlClient Data Provider HostServer01 App01admin
Log File Auto Grow App01 App01_log 9/26/2017 0:37 .Net SqlClient Data Provider HostServer01 App01admin
Log File Auto Grow App01 App01_log 9/26/2017 0:37 .Net SqlClient Data Provider HostServer01 App01admin
Log File Auto Grow App01 App01_log 9/26/2017 0:36 .Net SqlClient Data Provider HostServer01 App01admin
Log File Auto Grow App01 App01_log 9/26/2017 0:36 .Net SqlClient Data Provider HostServer01 App01admin
Log File Auto Grow App01 App01_log 9/26/2017 0:36 .Net SqlClient Data Provider HostServer01 App01admin
Log File Auto Grow App01 App01_log 9/26/2017 0:36 .Net SqlClient Data Provider HostServer01 App01admin
Log File Auto Grow App01 App01_log 9/26/2017 0:36 .Net SqlClient Data Provider HostServer01 App01admin
Log File Auto Grow App01 App01_log 9/26/2017 0:35 .Net SqlClient Data Provider HostServer01 App01admin
Log File Auto Grow App01 App01_log 9/26/2017 0:35 .Net SqlClient Data Provider HostServer01 App01admin
Log File Auto Grow App01 App01_log 9/26/2017 0:35 .Net SqlClient Data Provider HostServer01 App01admin
Log File Auto Grow App01 App01_log 9/26/2017 0:35 NULL NULL sa
Log File Auto Grow App01 App01_log 9/26/2017 0:34 .Net SqlClient Data Provider HostServer01 App01admin
Log File Auto Grow App01 App01_log 9/26/2017 0:33 .Net SqlClient Data Provider HostServer01 App01admin
Log File Auto Grow App01 App01_log 9/26/2017 0:33 .Net SqlClient Data Provider HostServer01 App01admin
Log File Auto Grow App01 App01_log 9/26/2017 0:32 .Net SqlClient Data Provider HostServer01 App01admin
Log File Auto Grow App01 App01_log 9/26/2017 0:32 .Net SqlClient Data Provider HostServer01 App01admin
Log File Auto Grow App01 App01_log 9/26/2017 0:31 .Net SqlClient Data Provider HostServer01 App01admin
Log File Auto Grow App01 App01_log 9/26/2017 0:31 .Net SqlClient Data Provider HostServer01 App01admin
Log File Auto Grow App01 App01_log 9/26/2017 0:31 .Net SqlClient Data Provider HostServer01 App01admin
Log File Auto Grow App01 App01_log 9/26/2017 0:30 .Net SqlClient Data Provider HostServer01 App01admin
Log File Auto Grow App01 App01_log 9/26/2017 0:29 .Net SqlClient Data Provider HostServer01 App01admin
Log File Auto Grow App01 App01_log 9/26/2017 0:29 .Net SqlClient Data Provider HostServer01 App01admin
Log File Auto Grow App01 App01_log 9/26/2017 0:28 .Net SqlClient Data Provider HostServer01 App01admin
Log File Auto Grow App01 App01_log 9/26/2017 0:27 .Net SqlClient Data Provider HostServer01 App01admin
Log File Auto Grow App01 App01_log 9/26/2017 0:26 .Net SqlClient Data Provider HostServer01 App01admin
Log File Auto Grow App01 App01_log 9/26/2017 0:26 .Net SqlClient Data Provider HostServer01 App01admin
Log File Auto Grow App01 App01_log 9/26/2017 0:25 .Net SqlClient Data Provider HostServer01 App01admin
Log File Auto Grow App01 App01_log 9/26/2017 0:25 .Net SqlClient Data Provider HostServer01 App01admin
Log File Auto Grow App01 App01_log 9/26/2017 0:24 .Net SqlClient Data Provider HostServer01 App01admin
Log File Auto Grow App01 App01_log 9/26/2017 0:24 .Net SqlClient Data Provider HostServer01 App01admin
Log File Auto Grow App01 App01_log 9/26/2017 0:23 .Net SqlClient Data Provider HostServer01 App01admin
Log File Auto Grow App01 App01_log 9/26/2017 0:22 .Net SqlClient Data Provider HostServer01 App01admin
Log File Auto Grow App01 App01_log 9/26/2017 0:20 .Net SqlClient Data Provider HostServer01 App01admin
Log File Auto Grow App01 App01_log 9/26/2017 0:19 .Net SqlClient Data Provider HostServer01 App01admin
Log File Auto Grow App01 App01_log 9/26/2017 0:18 .Net SqlClient Data Provider HostServer01 App01admin
Log File Auto Grow App01 App01_log 9/26/2017 0:16 .Net SqlClient Data Provider HostServer01 App01admin
Log File Auto Grow App01 App01_log 9/26/2017 0:15 .Net SqlClient Data Provider HostServer01 App01admin
Log File Auto Grow App01 App01_log 9/26/2017 0:14 .Net SqlClient Data Provider HostServer01 App01admin
Log File Auto Grow App01 App01_log 9/26/2017 0:14 .Net SqlClient Data Provider HostServer01 App01admin
Log File Auto Grow App01 App01_log 9/26/2017 0:13 .Net SqlClient Data Provider HostServer01 App01admin
Log File Auto Grow App01 App01_log 9/26/2017 0:12 .Net SqlClient Data Provider HostServer01 App01admin
Log File Auto Grow App01 App01_log 9/26/2017 0:11 .Net SqlClient Data Provider HostServer01 App01admin
Log File Auto Grow App01 App01_log 9/26/2017 0:10 .Net SqlClient Data Provider HostServer01 App01admin
Log File Auto Grow App01 App01_log 9/26/2017 0:09 .Net SqlClient Data Provider HostServer01 App01admin
Log File Auto Grow App01 App01_log 9/26/2017 0:08 .Net SqlClient Data Provider HostServer01 App01admin
Log File Auto Grow App01 App01_log 9/26/2017 0:07 .Net SqlClient Data Provider HostServer01 App01admin
Log File Auto Grow App01 App01_log 9/26/2017 0:06 .Net SqlClient Data Provider HostServer01 App01admin
Log File Auto Grow App01 App01_log 9/26/2017 0:05 .Net SqlClient Data Provider HostServer01 App01admin
Log File Auto Grow App01 App01_log 9/26/2017 0:04 .Net SqlClient Data Provider HostServer01 App01admin
Log File Auto Grow App01 App01_log 9/26/2017 0:04 .Net SqlClient Data Provider HostServer01 App01admin
Log File Auto Grow App01 App01_log 9/26/2017 0:03 .Net SqlClient Data Provider HostServer01 App01admin

There was a single growth caused by SQLCMD on DBSQL– this was the index maintenance job running directly on the SQL Server.  Almost all of the other growths were triggered by a.NET application on application server HostServer01.  The App01 database's LDF/log file was set to autogrow in increments of 512MB so each row in the table above represents a single instance of that growth – as you can see the unit of work was generating 512MB of work every 30-60 seconds, which is a significant load.

The next step was setting up a basic XEvents session to track what statement(s) might be generating the load - sure enough there was a serious smoking gun:


https://i.pinimg.com/originals/c0/19/57/c01957fd9e59fe4a46a67ad49e049cc0.jpg

Check out the row near the middle with a very large duration, logical_reads count, *and* writes count.

Durations in SQL 2012 Extended Events are in microseconds – a very small unit.

The catch was the duration for this statement was 6,349,645,772 (6 billion) microseconds…105.83 minutes for this one query!

--

The App01 database was in SIMPLE recovery (when I find this in the wild, I always question it – isn’t point-in-time recovery important? – but that is another discussion).  The relevance here is that in SIMPLE recovery, LOG backups are irrelevant (and actually *can’t* be run) – once a transaction or batch completes, the LDF/LOG file space is marked available for re-use, meaning that *usually* the LDF file doesn’t grow very large.

A database in SIMPLE recovery growing large LDF/LOG files almost always means a long-running unit of work or accidentally open transactions (a BEGIN TRAN with no subsequent CLOSE TRAN) – looking at the errors in the SQL Error Log the previous night, the 9002 log file full errors stopped at 1:45:51am server time, which means the offending unit of work ended then one way or another (crash or success).

Sure enough when I filtered the XEvents event file to things with a duration > 100 microseconds and then scanned down to 1:45:00 I quickly saw the row shown above.   Note this doesn’t mean the unit of work was excessively large in CPU/RAM/IO/etc. (and in fact the query errored out due to lack of LOG space) but the excessive duration made all of the tiny units of work over the previous 105 minutes have to persist in the transaction LDF/LOG file until this unit of work completed, preventing all of the LOG from that time to be marked for re-use until this statement ended.

--

The query in question was this:

--

exec sp_executesql N'DELETE from SecurityAction WHERE ActionDate < @1 AND (ActionCode != 102 AND ActionCode != 103 AND ActionCode != 129 AND ActionCode != 130)',N'@1 datetime',@1='2017-09-14 00:00:00'

--

Stripping off the sp_executesql wrapper and the parameter replacement turned it into this:

--

DELETE from SecurityAction
WHERE ActionDate < '2017-09-14 00:00:00'
AND
(
ActionCode != 102
AND ActionCode != 103
AND ActionCode != 129
AND ActionCode != 130
)

--


Checking out App01.dbo.SecurityAction, the table was 13GB with 14 million rows.  Looking at a random TOP 1000 rows I saw some rows that would satisfy this query that were at least 7 months old at the time of this incident (from September 2017) and there may have been even older rows – I didn’t want to spend the resources to run an ORDER BY query to find the absolute oldest row.  I didn’t know based on this evidence whether this meant this individual statement has been failing for a long time or maybe this DELETE statement had been a recent addition to the overnight processes and had been failing since then.

The DELETE looked like it must be a scheduled/regular operation from a .NET application running on HostServer01 since we were seeing it each night in a similar time window.

The immediate term action I recommended to run a batched DELETE to purge this table down – once it had been purged down maybe the nightly operation would run with a short enough duration to not break the universe, a;though I did advise them that changing the nightly process to be batched wouldn't hurt (although they were dealing with a vendor app).

The code I suggested (and we eventually went with) looked like this:

--

WHILE 1=1
BEGIN
       DELETE TOP (10000) from SecurityAction
       WHERE ActionDate < '2017-09-14 00:00:00'
       AND
       (
       ActionCode != 102
       AND ActionCode != 103
       AND ActionCode != 129
       AND ActionCode != 130
       )
END
       
--

This would delete rows in 10,000 row batches; wrapping it in the WHILE 1=1 allows the batch to run over and over without ending.  Once there are no more rows to satisfy the criteria the query *WILL CONTINUE TO RUN* but will simply delete -0- rows each time.

Once you are ready to stop the batch you simply cancel it and it will end in whatever iteration of the loop it is currently in.

I could have written a WHILE clause that actually checks for the existence of rows that meet the criteria, but there were so many rows that met the criteria right then that running the check itself would have been much more intensive than simply allowing the DELETE to run over and over.

We finally ended up going with this code and after a half day of running it had the table purged down from 14 million rows to 700,000, and a run the next day of the unmodified "regular" DELETE statement completed in under a minute.

--

This seems like a very simple fix, and runs contrary to some database teaching that say batches (and their cousins, the cursors) are evil and should be avoided no matter what.  Many blogs, classes, etc. preach the value of set-based operations at all times.

This is where we see the standard DBA trope once again:

https://memegenerator.net/img/instances/58097633/what-if-i-told-you-it-depends.jpg

Always use the right tool for the job - and in a case like this the right tool was a batch-based solution.  You could tune the size of the batches (is 10,000 the best number?  Maybe a 100,000 row batch (or a 1,000 row batch) would have the best performance) but regardless of that a small batch was a better choice than a 13 million row set!

At the end of the day this seems very simple - but you would be amazed to see how often something like this comes up.

To minimize LOG space needed, take small bites!

http://www.nicoleconner.com.au/wp-content/uploads/2016/04/66247903.jpg

Hope this helps!

Friday, December 8, 2017

Come to SQL Saturday Providence!

Tomorrow 12/09/2017 is SQL Saturday #694 in Providence, RI.

I will be presenting by SQL Server Health Check session first thing in the morning:

--

Does it Hurt When I Do This? Performing a SQL Server Health Check

Speaker: Andy Galbraith
Duration: 60 minutes
Track: Enterprise Database Administration & Deployment
How often do you review your SQL Servers for basic security, maintenance, and performance issues?  Many of the servers I "inherit" as a managed services provider have quite a few gaping holes. It is not unusual to find databases that are never backed up, servers with constant login failures (is it an attack or a bad connection string?), and servers that need more RAM/CPU/etc. (or sometimes that even have too much!) 

Come learn how to use freely available tools from multiple layers of the SQL Server stack to check your servers for basic issues like missing backups and CheckDB as well as for more advanced issues like page life expectancy problems and improper indexing. If you are responsible in any way for a Microsoft SQL Server (DBA, Windows Admin, even a Developer) you will see value in this session!
--

In total there are six tracks with five individual sessions each, so a full day of content!

Registration is still open at https://www.sqlsaturday.com/694/RegisterNow.aspx - come to Bryant University tomorrow and check it out!





Wednesday, November 1, 2017

PASS Summit 2017 - Live and in Living Color!

UPDATE - the feed and schedule are live at http://www.pass.org/summit/2017/PASStv.aspx

http://rochistory.com/blog/wp-content/uploads/2014/09/NBC-logo.png
Not in Seattle for some reason this week?  Like maybe you stayed home because it was Halloween week, or you don't have the funds, or you are preparing for the zombie apocalypse?

http://www.demotivation.us/media/demotivators/demotivation.us_Zombie-Apocalypse-Preparation-level-genius_133338427933.jpg

PASS is ready to help you out - starting today (Wednesday 11/01/2017) at 8:10 PDT (15:10 UTC) PASS will stream the keynotes, some sessions, and certain other round tables and events LIVE online at http://www.pass.org/summit/2017/Home.aspx

The schedule for the next few days isn't up yet but will be soon - tune in!

https://radiosoundsfamiliar.com/resources/Rebrand/kep%20calm%20and%20tune%20in%20logo.jpg.opt197x337o0%2C0s197x337.jpg

Monday, October 23, 2017

But I Don't *Want* To "Restore from a backup of the database or repair the database!"

Do you like Log Shipping?

Before my time at Ntirety I had done quite a bit of work with failover clustering and mirroring (trending into some Availability Groups) but I had never really done that much replication or log shipping.  Even at my previous managed services remote DBA job I simply hadn't run across it...

I have said in this blog more than once that education (training and real-world) is one of the most important parts of being a DBA, and that we should always be learning; my time at Ntirety has definitely educated me about replication and log shipping!

https://memegenerator.net/img/instances/500x/42421555/be-careful-what-you-wish-for-punk.jpg
This Log Shipping story (maybe its a cautionary tale - so many of these stories are) begins with a failed server migration.  We were trying to replace a server with a new VM and due to some issues completely unrelated to log shipping, we ended up reverting to the original server (maybe once we figure that out I will write about it too).  

The problem was that I had already dropped the log shipping from OldServer01.Database99 to LSSecondary.Database99 and created new log shipping from NewServer01.Database99 to LSSecondary.Database99.  Part of the backout plan was dropping *that* log shipping and recreating the original relationship between OldServer01 and LSSecondary.

I had the creation process scripted out, so I ran the scripts, made sure they succeeded, checked that LSSecondary.Database99 reflected the current data (we had it in Standyby/Read-Only) and then walked away from log shipping to continue the rest of the revert process.

http://vignette1.wikia.nocookie.net/cardfight/images/e/e1/Funniest_Memes_you-my-friend-just-made-a-big-mistake_10940.jpeg/revision/latest?cb=20150805104725

Monday morning rolled around and one of the business users reported that Database99 on LSSecondary was in Restoring rather than Standby, and wanted me to check it out.  Sure enough, Database99 was in Restoring even though the LSRestore job wasn't running and there was no active RESTORE process running.

OK....

I check the properties of the log shipping secondary in the Management Studio GUI and they looked as expected:




You can see these same settings via T-SQL with the following query:
SELECT secondary_database
, restore_mode
, disconnect_users
, last_restored_file
FROM msdb.dbo.log_shipping_secondary_databases 

Since the Database shows restore_mode 1 (Standby) and disconnect_users 1 (true) there is no reason by the configuration that the database should be in RESTORING.

The next stop was the SQL Server Error Log.  This probably should have been my first stop, but since I had just set this log shipping up I thought of the configuration first.  The Error Log was a horror show:

--

Error: 3456, Severity: 16, State: 1.
Could not redo log record (31185729:63924:22), for transaction ID (0:0), on page (1:511238), database 'Database99' (database ID 5). Page: LSN = (31185729:63857:14), type = 16. Log: OpCode = 7, context 24, PrevPageLSN: (31185729:63804:511). Restore from a backup of the database, or repair the database

--

…followed by 100+ lines of Error Stack Dump <cue scary music>:

--

Using 'dbghelp.dll' version '4.0.5'
**Dump thread - spid = 0, EC = 0x00000002E61A20F0
***Stack Dump being sent to F:\Program Files\Microsoft SQL Server\MSSQL10_50.MSSQLSERVER\MSSQL\LOG\SQLDump0001.txt
* *******************************************************************************
*
* BEGIN STACK DUMP:
*   10/22/17 03:00:34 spid 55
*
* HandleAndNoteToErrorlog: Exception raised, major=34, minor=56, severity=16
*
* Input Buffer 506 bytes -
*             RESTORE LOG [Database99] FROM DISK = N'\\LSSecondary\
*  Database99\Database99_20171022053000.trn' WITH FILE = 1, STANDBY =
*   N'F:\Program Files\Microsoft SQL Server\MSSQL10_50.MSSQLSERVER\MSSQL\DA
*  TA\Database99_20171022090018.tuf'
*  
*
*  MODULE                          BASE      END       SIZE
* sqlservr                       0000000000A40000  00000000045EDFFF  03bae000
* ntdll                          0000000077510000  00000000776B8FFF  001a9000
* kernel32                       00000000772F0000  000000007740EFFF  0011f000
* KERNELBASE                     000007FEFD560000  000007FEFD5CBFFF  0006c000
* ADVAPI32                       000007FEFF730000  000007FEFF80AFFF  000db000
* msvcrt                         000007FEFED70000  000007FEFEE0EFFF  0009f000
* sechost                        000007FEFE5B0000  000007FEFE5CEFFF  0001f000
* RPCRT4                         000007FEFE630000  000007FEFE75CFFF  0012d000
* MSVCR80                        0000000074F50000  0000000075018FFF  000c9000
* MSVCP80                        0000000074BD0000  0000000074CD8FFF  00109000
* sqlos                          00000000745D0000  00000000745D6FFF  00007000
* Secur32                        000007FEFCFE0000  000007FEFCFEAFFF  0000b000
* SSPICLI                        000007FEFD260000  000007FEFD284FFF  00025000
* pdh                            000007FEF6F00000  000007FEF6F4DFFF  0004e000
* SHLWAPI                        000007FEFF680000  000007FEFF6F0FFF  00071000
* GDI32                          000007FEFEC20000  000007FEFEC86FFF  00067000
* USER32                         0000000077410000  0000000077509FFF  000fa000
* LPK                            000007FEFF810000  000007FEFF81DFFF  0000e000
* USP10                          000007FEFF190000  000007FEFF258FFF  000c9000
* USERENV                        000007FEFC740000  000007FEFC75DFFF  0001e000
* profapi                        000007FEFD3D0000  000007FEFD3DEFFF  0000f000
* WINMM                          000007FEF6EC0000  000007FEF6EFAFFF  0003b000
* IPHLPAPI                       000007FEFB670000  000007FEFB696FFF  00027000
* NSI                            000007FEFE620000  000007FEFE627FFF  00008000
* WINNSI                         000007FEFB660000  000007FEFB66AFFF  0000b000
* opends60                       00000000745E0000  00000000745E7FFF  00008000
* NETAPI32                       000007FEFAF00000  000007FEFAF15FFF  00016000
* netutils                       000007FEFC970000  000007FEFC97BFFF  0000c000
* srvcli                         000007FEFCF20000  000007FEFCF42FFF  00023000
* wkscli                         000007FEFAEE0000  000007FEFAEF4FFF  00015000
* LOGONCLI                       000007FEFCC80000  000007FEFCCAFFFF  00030000
* SAMCLI                         000007FEFA310000  000007FEFA323FFF  00014000
* BatchParser                    00000000743D0000  00000000743FCFFF  0002d000
* IMM32                          000007FEFF700000  000007FEFF72DFFF  0002e000
* MSCTF                          000007FEFEE10000  000007FEFEF18FFF  00109000
* psapi                          00000000776D0000  00000000776D6FFF  00007000
* instapi10                      0000000074AA0000  0000000074AACFFF  0000d000
* cscapi                         000007FEF6EB0000  000007FEF6EBEFFF  0000f000
* sqlevn70                       0000000071050000  0000000071250FFF  00201000
* ntmarta                        000007FEFC530000  000007FEFC55CFFF  0002d000
* WLDAP32                        000007FEFF130000  000007FEFF181FFF  00052000
* CRYPTSP                        000007FEFCBE0000  000007FEFCBF7FFF  00018000
* rsaenh                         000007FEFC9E0000  000007FEFCA26FFF  00047000
* CRYPTBASE                      000007FEFD300000  000007FEFD30EFFF  0000f000
* BROWCLI                        000007FEF6DE0000  000007FEF6DF1FFF  00012000
* AUTHZ                          000007FEFCE70000  000007FEFCE9EFFF  0002f000
* MSCOREE                        000007FEF9F90000  000007FEF9FFEFFF  0006f000
* mscoreei                       000007FEF9EE0000  000007FEF9F77FFF  00098000
* ole32                          000007FEFEF20000  000007FEFF122FFF  00203000
* credssp                        000007FEFC900000  000007FEFC909FFF  0000a000
* msv1_0                         000007FEFCB80000  000007FEFCBD1FFF  00052000
* cryptdll                       000007FEFCF50000  000007FEFCF63FFF  00014000
* kerberos                       000007FEFCCB0000  000007FEFCD67FFF  000b8000
* MSASN1                         000007FEFD4B0000  000007FEFD4BEFFF  0000f000
* schannel                       000007FEFC980000  000007FEFC9D6FFF  00057000
* CRYPT32                        000007FEFD630000  000007FEFD79CFFF  0016d000
* security                       0000000074210000  0000000074212FFF  00003000
* WS2_32                         000007FEFE5D0000  000007FEFE61CFFF  0004d000
* SHELL32                        000007FEFD820000  000007FEFE5A8FFF  00d89000
* OLEAUT32                       000007FEFEC90000  000007FEFED66FFF  000d7000
* ftimport                       0000000060000000  0000000060024FFF  00025000
* MSFTE                          0000000049980000  0000000049D2DFFF  003ae000
* VERSION                        000007FEFC520000  000007FEFC52BFFF  0000c000
* dbghelp                        000000006F0C0000  000000006F21DFFF  0015e000
* WINTRUST                       000007FEFD7A0000  000007FEFD7DAFFF  0003b000
* ncrypt                         000007FEFCE20000  000007FEFCE6FFFF  00050000
* bcrypt                         000007FEFCDF0000  000007FEFCE11FFF  00022000
* mswsock                        000007FEFCC20000  000007FEFCC74FFF  00055000
* wship6                         000007FEFCDE0000  000007FEFCDE6FFF  00007000
* wshtcpip                       000007FEFC620000  000007FEFC626FFF  00007000
* ntdsapi                        000007FEFB0A0000  000007FEFB0C6FFF  00027000
* DNSAPI                         000007FEFCA30000  000007FEFCA8AFFF  0005b000
* rasadhlp                       000007FEFAA70000  000007FEFAA77FFF  00008000
* fwpuclnt                       000007FEFB560000  000007FEFB5B2FFF  00053000
* bcryptprimitives               000007FEFC7A0000  000007FEFC7EBFFF  0004c000
* CLBCatQ                        000007FEFF560000  000007FEFF5F8FFF  00099000
* sqlncli10                      0000000072690000  0000000072947FFF  002b8000
* COMCTL32                       000007FEF6F50000  000007FEF6FEFFFF  000a0000
* COMDLG32                       000007FEFF4C0000  000007FEFF556FFF  00097000
* SQLNCLIR10                     0000000071260000  0000000071296FFF  00037000
* netbios                        000007FEF4040000  000007FEF4049FFF  0000a000
* xpsqlbot                       0000000074270000  0000000074277FFF  00008000
* xpstar                         0000000065380000  0000000065407FFF  00088000
* SQLSCM                         0000000074590000  000000007459DFFF  0000e000
* ODBC32                         000007FEF2BA0000  000007FEF2C50FFF  000b1000
* ATL80                          0000000074F00000  0000000074F1FFFF  00020000
* odbcint                        00000000733D0000  0000000073407FFF  00038000
* clusapi                        000007FEF6760000  000007FEF67AFFFF  00050000
* resutils                       000007FEF6740000  000007FEF6758FFF  00019000
* xpstar                         0000000074240000  0000000074264FFF  00025000
* xplog70                        0000000074280000  000000007428FFFF  00010000
* xplog70                        0000000074220000  0000000074221FFF  00002000
* dsrole                         000007FEFBCD0000  000007FEFBCDBFFF  0000c000
* RpcRtRemote                    000007FEFD3B0000  000007FEFD3C3FFF  00014000
* ssdebugps                      000007FEEAE20000  000007FEEAE31FFF  00012000
* apphelp                        000007FEFD2A0000  000007FEFD2F6FFF  00057000
* msxmlsql                       00000000632C0000  0000000063410FFF  00151000
* dbghelp                        000000006F8D0000  000000006FA2DFFF  0015e000
*
*     P1Home: 0044004C00510053:  
*     P2Home: 00000000004A87B4:  0122002501210023  0124002900230027  0126002D0025002B  012800310127002F  012A003501290033  002C0039002B0037  
*     P3Home: 00000000004C8310:  000000002A010FF0  000000002A010FF0  0000000017407610  000000001AA86DD0  000000001736E3F0  000000002A027FD0  
*     P4Home: 0000000000000000:  
*     P5Home: 0052004500AE00AC:  
*     P6Home: 000000001742C658:  0050005C003A0046  00720067006F0072  00460020006D0061  00730065006C0069  00630069004D005C  006F0073006F0072  
* ContextFlags: 000000000010000F:  0000000000000000  0000000000000000  0000000000000000  0000000000000000  0000000000000000  0000000000000000  
*      MxCsr: 0000000000001FA0:  
*      SegCs: 0000000000000033:  
*      SegDs: 000000000000002B:  
*      SegEs: 000000000000002B:  
*      SegFs: 0000000000000053:  
*      SegGs: 000000000000002B:  
*      SegSs: 000000000000002B:  
*     EFlags: 0000000000000202:  
*        Rax: 0000000000991AD3:  0000000000000000  0000000000000000  0000000000000000  0000000000000000  0000000000000000  0000000000000000  
*        Rcx: 000000002C568260:  0044004C00510053  00000000004A87B4  00000000004C8310  0000000000000000  0052004500AE00AC  000000001742C658  
*        Rdx: 0000000000000000:  
*        Rbx: 0000000000000000:  
*        Rsp: 000000002C568870:  0000000000000000  00000002E61A20F0  000000002C568AA0  000007FFFFEDC000  00000000000042AC  0000000000000000  
*        Rbp: 0000000000000000:  
*        Rsi: 00000002E61A20F0:  00000002E61A20F0  000000063039CEA0  0000000000000000  0000000000000004  00000002E61A2110  00000002E61A2110  
*        Rdi: 000000002C568AA0:  000000001742C650  FFFFFFFFFFFFFFFE  000007FFFFEDB4A8  0000000000000000  00000006699981A0  0000000000A426C3  
*         R8: 0000000000000000:  
*         R9: 0000000000000000:  
*        R10: 0000000000000000:  
*        R11: 000000002C569080:  0044004C00510053  00000000004A87B4  00000000004C8310  0000000000000000  0052004500AE00AC  000000001742C658  
*        R12: 0000000000000001:  
*        R13: 000000063039CEA0:  000000063039C080  000000063039C3B0  000000063039CAD0  00000002E61A2080  000000063039CF10  000000008089CE80  
*        R14: 000000002C569F60:  0064006E00610048  006E00410065006C  0074006F004E0064  0045006F00540065  0072006F00720072  003A0067006F006C  
*        R15: 000000000000003F:  
*        Rip: 000007FEFD56B3DD:  C3000000C8C48148  003B830874F68548  0001BD1689660376  02D6840FFD3B0000  F3840F02FF830000  870F07FF83000002  
* *******************************************************************************
* -------------------------------------------------------------------------------
* Short Stack Dump
000007FEFD56B3DD Module(KERNELBASE+000000000000B3DD)
0000000002AE981D Module(sqlservr+00000000020A981D)
0000000002AEDD3A Module(sqlservr+00000000020ADD3A)
0000000002B210C3 Module(sqlservr+00000000020E10C3)
0000000002B20F5D Module(sqlservr+00000000020E0F5D)
0000000000E3F5D8 Module(sqlservr+00000000003FF5D8)
0000000000E3F2A6 Module(sqlservr+00000000003FF2A6)
0000000002A47DBB Module(sqlservr+0000000002007DBB)
0000000002B24941 Module(sqlservr+00000000020E4941)
0000000002D14BFB Module(sqlservr+00000000022D4BFB)
0000000002CC0A44 Module(sqlservr+0000000002280A44)
0000000002CC0719 Module(sqlservr+0000000002280719)
0000000002CBFDC5 Module(sqlservr+000000000227FDC5)
0000000002D4C1B4 Module(sqlservr+000000000230C1B4)
0000000002D4B845 Module(sqlservr+000000000230B845)
0000000002CA321E Module(sqlservr+000000000226321E)
0000000002D4DD33 Module(sqlservr+000000000230DD33)
0000000002D52D0F Module(sqlservr+0000000002312D0F)
0000000002CC8CE6 Module(sqlservr+0000000002288CE6)
0000000001F288A1 Module(sqlservr+00000000014E88A1)
0000000000A99D59 Module(sqlservr+0000000000059D59)
0000000000A9A9B8 Module(sqlservr+000000000005A9B8)
0000000000A9B30C Module(sqlservr+000000000005B30C)
0000000000A9C1A6 Module(sqlservr+000000000005C1A6)
0000000000AE5342 Module(sqlservr+00000000000A5342)
0000000000A4BBD8 Module(sqlservr+000000000000BBD8)
0000000000A4B8BA Module(sqlservr+000000000000B8BA)
0000000000A4B6FF Module(sqlservr+000000000000B6FF)
0000000000F68FB6 Module(sqlservr+0000000000528FB6)
0000000000F69175 Module(sqlservr+0000000000529175)
0000000000F69839 Module(sqlservr+0000000000529839)
0000000000F69502 Module(sqlservr+0000000000529502)
0000000074F537D7 Module(MSVCR80+00000000000037D7)
0000000074F53894 Module(MSVCR80+0000000000003894)
00000000773059CD Module(kernel32+00000000000159CD)
000000007753B981 Module(ntdll+000000000002B981)
Stack Signature for the dump is 0x000000012832F01D
External dump process return code 0x20000001.  External dump process returned no errors. 

--

... followed by:

--

Error: 9003, Severity: 20, State: 6.
The log scan number (31185729:63630:107) passed to log scan in database 'Database99' is not valid. This error may indicate data corruption or that the log file (.ldf) does not match the data file (.mdf). If this error occurred during replication, re-create the publication. Otherwise, restore from backup if the problem results in a failure during startup. 
--

https://thecatholicgeeks.files.wordpress.com/2015/05/beafraid.jpg

Pretty scary looking errors…I checked the database on OldServer01, especially since the CheckDB job hadn't run in several days (back to before the migration attempt), and thankfully it was clear:

--

Date and time: 2017-10-23 10:49:35
Server: OldServer01
Version: 10.50.4305.0
Edition: Enterprise Edition (64-bit)
Procedure: [master].[dbo].[DatabaseIntegrityCheck]
Parameters: @Databases = 'Database99', @CheckCommands = 'CHECKDB', @PhysicalOnly = 'N', @NoIndex = 'N', @ExtendedLogicalChecks = 'N', @TabLock = 'N'
, @FileGroups = NULL, @Objects = NULL, @LockTimeout = NULL, @LogToTable = 'N', @Execute = 'Y'
Date and time: 2017-10-23 10:49:35
Database: [Database99]
Status: ONLINE
Standby: No
Updateability: READ_WRITE
User access: MULTI_USER
Is accessible: Yes
Recovery model: FULL
Date and time: 2017-10-23 10:49:35
Command: DBCC CHECKDB ([Database99]) WITH NO_INFOMSGS, ALL_ERRORMSGS, DATA_PURITY
Outcome: Succeeded
Duration: 00:35:44
Date and time: 2017-10-23 11:25:19


--

After some poking around and several rebuilds of log shipping (deleting the secondary, taking a new FULL on OldServer01, restoring it to LSSecondary, and then recreating the process) I figured out the issue was related to the first part of the first error: 

“Could not redo log record (31185747:40079:1), for transaction ID (0:1129634848), on page (1:10522412), database 'Database99'” 

The redo reader process was looking in the log shipping folder \\LSSecondary\Database99 and trying to restore the oldest TRN log backup files, which were related to the log shipping configuration from before the migration/fail-back over the weekend.  Those older TRN files had log sequence numbers (LSNs) that were from before the FULL backup used to initialize the re-created log shipping, which was making the overall process error out and blowing up the process.

A colleague suggested that the log shipping restore should be skipping TRN files that are "too old" via LSN (which matched my understanding of how it "should" work) and that maybe this reflected one of the old TRN files having an invalid file header.  After some consideration we realized that probably was *not* the case since the error message doesn't reflect not being able to read the backup file header, and if it couldn't read the header how would it have a log record number (31185747:40079:1) to include in the "could not redo" portion of the error?

I deleted the old backups from \\LSSecondary\Database99 and tried to run the process again, and received the same errors.  When I looked in \\LSSecondary\Database99 I saw all of the old TRN files were back!

I had failed to consider (sigh) that the LSCopy job would bring all of the "old" files from the primary folder at \\OldServer01\Database99 back to \\LSSecondary\Database99, putting me back in the same situation.

Take 2 - I deleted all of the log shipping backups from *both* \\OldServer01\Database99 and \\LSSecondary\Database99 and then re-created the log shipping one last time…

https://memegenerator.net/img/instances/500x/27017972/success.jpg
This time the LS_Restore job functioned as expected, Database99 was in Standby/Read-Only on LSSecondary, and instead of my 100+ line stack dump I received some very respectable information messages in the SQL Error Log on LSSecondary:

--
Setting database option SINGLE_USER to ON for database Database99.
Starting up database 'Database99'.
The database 'Database99' is marked RESTORING and is in a state that does not allow recovery to be run.
Recovery is writing a checkpoint in database 'Database99' (5). This is an informational message only. No user action is required.
Recovery completed for database Database99 (database ID 5) in 2 second(s) (analysis 1080 ms, redo 0 ms, undo 466 ms.) This is an informational message only. No user action is required.
Starting up database 'Database99'.
CHECKDB for database 'Database99' finished without errors on 2017-10-23 10:49:44.400 (local time). This is an informational message only; no user action is required.
Log was restored. Database: Database99, creation date(time): 2017/10/22(02:20:41), first LSN: 31185760:34887:1, last LSN: 31185760:44042:1, number of dump devices: 1, device information: (FILE=1, TYPE=DISK: {'\\LSSecondary\Database99\Database99_20171023204500.trn'}). This is an informational message. No user action is required.
Setting database option MULTI_USER to ON for database Database99.
 --

The SINGLE_USER and MULTI_USER messages are artifacts of telling the log shipping to disconnect users when restoring backups.

--

Hope this helps!