https://media.makeameme.org/created/mega-bytes-well.jpg |
This was kind of a hard one for me because a lot of my blog posts are already about technical challenges from work, which means I have already written about many things that would have qualified for this category - I did find one thing from a few months ago that I hadn't documented yet...
--
As a services/production DBA I frequently get technical challenges related to issues about disk space, which usually tie back to issues about file growth, which usually tie back to issues about code, which usually tie back to issues about people (not necessarily developers - and yes, I do believe developers are people...usually...)
https://i0.wp.com/www.adamtheautomator.com/wp-content/uploads/2015/04/Worked-Fine-In-Dev-Ops-Problem-Now.jpg |
--
This story is yet another tale of woe starting with a page at 1am for a filling drive, in this case the LOG drive.
I went to my default trace query (described here) and found...a few...records for growths:
EventName | DatabaseName | FileName | StartTime | ApplicationName | HostName | LoginName |
Log File Auto Grow | App01 | App01_log | 9/26/2017 1:08 | .Net SqlClient Data Provider | HostServer01 | App01admin |
Log File Auto Grow | App01 | App01_log | 9/26/2017 1:08 | .Net SqlClient Data Provider | HostServer01 | App01admin |
Log File Auto Grow | App01 | App01_log | 9/26/2017 1:08 | .Net SqlClient Data Provider | HostServer01 | App01admin |
Log File Auto Grow | App01 | App01_log | 9/26/2017 1:08 | .Net SqlClient Data Provider | HostServer01 | App01admin |
Log File Auto Grow | App01 | App01_log | 9/26/2017 1:08 | .Net SqlClient Data Provider | HostServer01 | App01admin |
Log File Auto Grow | App01 | App01_log | 9/26/2017 1:07 | .Net SqlClient Data Provider | HostServer01 | App01admin |
Log File Auto Grow | App01 | App01_log | 9/26/2017 1:07 | .Net SqlClient Data Provider | HostServer01 | App01admin |
Log File Auto Grow | App01 | App01_log | 9/26/2017 1:07 | .Net SqlClient Data Provider | HostServer01 | App01admin |
Log File Auto Grow | App01 | App01_log | 9/26/2017 1:07 | .Net SqlClient Data Provider | HostServer01 | App01admin |
Log File Auto Grow | App01 | App01_log | 9/26/2017 1:07 | .Net SqlClient Data Provider | HostServer01 | App01admin |
Log File Auto Grow | App01 | App01_log | 9/26/2017 1:06 | .Net SqlClient Data Provider | HostServer01 | App01admin |
Log File Auto Grow | App01 | App01_log | 9/26/2017 1:05 | SQLCMD | DBSQL | NT SERVICE\SQLSERVERAGENT |
Log File Auto Grow | App01 | App01_log | 9/26/2017 1:04 | .Net SqlClient Data Provider | HostServer01 | App01admin |
Log File Auto Grow | App01 | App01_log | 9/26/2017 1:00 | .Net SqlClient Data Provider | HostServer01 | App01admin |
Log File Auto Grow | App01 | App01_log | 9/26/2017 0:59 | .Net SqlClient Data Provider | HostServer01 | App01admin |
Log File Auto Grow | App01 | App01_log | 9/26/2017 0:59 | .Net SqlClient Data Provider | HostServer01 | App01admin |
Log File Auto Grow | App01 | App01_log | 9/26/2017 0:59 | .Net SqlClient Data Provider | HostServer01 | App01admin |
Log File Auto Grow | App01 | App01_log | 9/26/2017 0:59 | .Net SqlClient Data Provider | HostServer01 | App01admin |
Log File Auto Grow | App01 | App01_log | 9/26/2017 0:59 | .Net SqlClient Data Provider | HostServer01 | App01admin |
Log File Auto Grow | App01 | App01_log | 9/26/2017 0:59 | .Net SqlClient Data Provider | HostServer01 | App01admin |
Log File Auto Grow | App01 | App01_log | 9/26/2017 0:58 | .Net SqlClient Data Provider | HostServer01 | App01admin |
Log File Auto Grow | App01 | App01_log | 9/26/2017 0:58 | .Net SqlClient Data Provider | HostServer01 | App01admin |
Log File Auto Grow | App01 | App01_log | 9/26/2017 0:58 | .Net SqlClient Data Provider | HostServer01 | App01admin |
Log File Auto Grow | App01 | App01_log | 9/26/2017 0:58 | .Net SqlClient Data Provider | HostServer01 | App01admin |
Log File Auto Grow | App01 | App01_log | 9/26/2017 0:58 | .Net SqlClient Data Provider | HostServer01 | App01admin |
Log File Auto Grow | App01 | App01_log | 9/26/2017 0:57 | .Net SqlClient Data Provider | HostServer01 | App01admin |
Log File Auto Grow | App01 | App01_log | 9/26/2017 0:57 | .Net SqlClient Data Provider | HostServer01 | App01admin |
Log File Auto Grow | App01 | App01_log | 9/26/2017 0:57 | .Net SqlClient Data Provider | HostServer01 | App01admin |
Log File Auto Grow | App01 | App01_log | 9/26/2017 0:57 | .Net SqlClient Data Provider | HostServer01 | App01admin |
Log File Auto Grow | App01 | App01_log | 9/26/2017 0:57 | .Net SqlClient Data Provider | HostServer01 | App01admin |
Log File Auto Grow | App01 | App01_log | 9/26/2017 0:56 | .Net SqlClient Data Provider | HostServer01 | App01admin |
Log File Auto Grow | App01 | App01_log | 9/26/2017 0:56 | .Net SqlClient Data Provider | HostServer01 | App01admin |
Log File Auto Grow | App01 | App01_log | 9/26/2017 0:56 | .Net SqlClient Data Provider | HostServer01 | App01admin |
Log File Auto Grow | App01 | App01_log | 9/26/2017 0:56 | .Net SqlClient Data Provider | HostServer01 | App01admin |
Log File Auto Grow | App01 | App01_log | 9/26/2017 0:56 | .Net SqlClient Data Provider | HostServer01 | App01admin |
Log File Auto Grow | App01 | App01_log | 9/26/2017 0:56 | .Net SqlClient Data Provider | HostServer01 | App01admin |
Log File Auto Grow | App01 | App01_log | 9/26/2017 0:56 | .Net SqlClient Data Provider | HostServer01 | App01admin |
Log File Auto Grow | App01 | App01_log | 9/26/2017 0:55 | .Net SqlClient Data Provider | HostServer01 | App01admin |
Log File Auto Grow | App01 | App01_log | 9/26/2017 0:55 | .Net SqlClient Data Provider | HostServer01 | App01admin |
Log File Auto Grow | App01 | App01_log | 9/26/2017 0:54 | .Net SqlClient Data Provider | HostServer01 | App01admin |
Log File Auto Grow | App01 | App01_log | 9/26/2017 0:54 | .Net SqlClient Data Provider | HostServer01 | App01admin |
Log File Auto Grow | App01 | App01_log | 9/26/2017 0:53 | .Net SqlClient Data Provider | HostServer01 | App01admin |
Log File Auto Grow | App01 | App01_log | 9/26/2017 0:52 | .Net SqlClient Data Provider | HostServer01 | App01admin |
Log File Auto Grow | App01 | App01_log | 9/26/2017 0:52 | .Net SqlClient Data Provider | HostServer01 | App01admin |
Log File Auto Grow | App01 | App01_log | 9/26/2017 0:51 | .Net SqlClient Data Provider | HostServer01 | App01admin |
Log File Auto Grow | App01 | App01_log | 9/26/2017 0:51 | .Net SqlClient Data Provider | HostServer01 | App01admin |
Log File Auto Grow | App01 | App01_log | 9/26/2017 0:50 | .Net SqlClient Data Provider | HostServer01 | App01admin |
Log File Auto Grow | App01 | App01_log | 9/26/2017 0:49 | .Net SqlClient Data Provider | HostServer01 | App01admin |
Log File Auto Grow | App01 | App01_log | 9/26/2017 0:49 | .Net SqlClient Data Provider | HostServer01 | App01admin |
Log File Auto Grow | App01 | App01_log | 9/26/2017 0:48 | .Net SqlClient Data Provider | HostServer01 | App01admin |
Log File Auto Grow | App01 | App01_log | 9/26/2017 0:48 | .Net SqlClient Data Provider | HostServer01 | App01admin |
Log File Auto Grow | App01 | App01_log | 9/26/2017 0:47 | .Net SqlClient Data Provider | HostServer01 | App01admin |
Log File Auto Grow | App01 | App01_log | 9/26/2017 0:47 | .Net SqlClient Data Provider | HostServer01 | App01admin |
Log File Auto Grow | App01 | App01_log | 9/26/2017 0:46 | .Net SqlClient Data Provider | HostServer01 | App01admin |
Log File Auto Grow | App01 | App01_log | 9/26/2017 0:45 | .Net SqlClient Data Provider | HostServer01 | App01admin |
Log File Auto Grow | App01 | App01_log | 9/26/2017 0:45 | .Net SqlClient Data Provider | HostServer01 | App01admin |
Log File Auto Grow | App01 | App01_log | 9/26/2017 0:44 | .Net SqlClient Data Provider | HostServer01 | App01admin |
Log File Auto Grow | App01 | App01_log | 9/26/2017 0:44 | .Net SqlClient Data Provider | HostServer01 | App01admin |
Log File Auto Grow | App01 | App01_log | 9/26/2017 0:44 | .Net SqlClient Data Provider | HostServer01 | App01admin |
Log File Auto Grow | App01 | App01_log | 9/26/2017 0:43 | .Net SqlClient Data Provider | HostServer01 | App01admin |
Log File Auto Grow | App01 | App01_log | 9/26/2017 0:43 | .Net SqlClient Data Provider | HostServer01 | App01admin |
Log File Auto Grow | App01 | App01_log | 9/26/2017 0:43 | .Net SqlClient Data Provider | HostServer01 | App01admin |
Log File Auto Grow | App01 | App01_log | 9/26/2017 0:43 | .Net SqlClient Data Provider | HostServer01 | App01admin |
Log File Auto Grow | App01 | App01_log | 9/26/2017 0:42 | .Net SqlClient Data Provider | HostServer01 | App01admin |
Log File Auto Grow | App01 | App01_log | 9/26/2017 0:42 | .Net SqlClient Data Provider | HostServer01 | App01admin |
Log File Auto Grow | App01 | App01_log | 9/26/2017 0:42 | .Net SqlClient Data Provider | HostServer01 | App01admin |
Log File Auto Grow | App01 | App01_log | 9/26/2017 0:42 | .Net SqlClient Data Provider | HostServer01 | App01admin |
Log File Auto Grow | App01 | App01_log | 9/26/2017 0:42 | .Net SqlClient Data Provider | HostServer01 | App01admin |
Log File Auto Grow | App01 | App01_log | 9/26/2017 0:41 | .Net SqlClient Data Provider | HostServer01 | App01admin |
Log File Auto Grow | App01 | App01_log | 9/26/2017 0:41 | .Net SqlClient Data Provider | HostServer01 | App01admin |
Log File Auto Grow | App01 | App01_log | 9/26/2017 0:41 | .Net SqlClient Data Provider | HostServer01 | App01admin |
Log File Auto Grow | App01 | App01_log | 9/26/2017 0:41 | .Net SqlClient Data Provider | HostServer01 | App01admin |
Log File Auto Grow | App01 | App01_log | 9/26/2017 0:40 | .Net SqlClient Data Provider | HostServer01 | App01admin |
Log File Auto Grow | App01 | App01_log | 9/26/2017 0:40 | .Net SqlClient Data Provider | HostServer01 | App01admin |
Log File Auto Grow | App01 | App01_log | 9/26/2017 0:40 | .Net SqlClient Data Provider | HostServer01 | App01admin |
Log File Auto Grow | App01 | App01_log | 9/26/2017 0:40 | .Net SqlClient Data Provider | HostServer01 | App01admin |
Log File Auto Grow | App01 | App01_log | 9/26/2017 0:39 | .Net SqlClient Data Provider | HostServer01 | App01admin |
Log File Auto Grow | App01 | App01_log | 9/26/2017 0:39 | .Net SqlClient Data Provider | HostServer01 | App01admin |
Log File Auto Grow | App01 | App01_log | 9/26/2017 0:39 | .Net SqlClient Data Provider | HostServer01 | App01admin |
Log File Auto Grow | App01 | App01_log | 9/26/2017 0:38 | .Net SqlClient Data Provider | HostServer01 | App01admin |
Log File Auto Grow | App01 | App01_log | 9/26/2017 0:38 | .Net SqlClient Data Provider | HostServer01 | App01admin |
Log File Auto Grow | App01 | App01_log | 9/26/2017 0:38 | .Net SqlClient Data Provider | HostServer01 | App01admin |
Log File Auto Grow | App01 | App01_log | 9/26/2017 0:38 | .Net SqlClient Data Provider | HostServer01 | App01admin |
Log File Auto Grow | App01 | App01_log | 9/26/2017 0:38 | .Net SqlClient Data Provider | HostServer01 | App01admin |
Log File Auto Grow | App01 | App01_log | 9/26/2017 0:37 | .Net SqlClient Data Provider | HostServer01 | App01admin |
Log File Auto Grow | App01 | App01_log | 9/26/2017 0:37 | .Net SqlClient Data Provider | HostServer01 | App01admin |
Log File Auto Grow | App01 | App01_log | 9/26/2017 0:37 | .Net SqlClient Data Provider | HostServer01 | App01admin |
Log File Auto Grow | App01 | App01_log | 9/26/2017 0:37 | .Net SqlClient Data Provider | HostServer01 | App01admin |
Log File Auto Grow | App01 | App01_log | 9/26/2017 0:36 | .Net SqlClient Data Provider | HostServer01 | App01admin |
Log File Auto Grow | App01 | App01_log | 9/26/2017 0:36 | .Net SqlClient Data Provider | HostServer01 | App01admin |
Log File Auto Grow | App01 | App01_log | 9/26/2017 0:36 | .Net SqlClient Data Provider | HostServer01 | App01admin |
Log File Auto Grow | App01 | App01_log | 9/26/2017 0:36 | .Net SqlClient Data Provider | HostServer01 | App01admin |
Log File Auto Grow | App01 | App01_log | 9/26/2017 0:36 | .Net SqlClient Data Provider | HostServer01 | App01admin |
Log File Auto Grow | App01 | App01_log | 9/26/2017 0:35 | .Net SqlClient Data Provider | HostServer01 | App01admin |
Log File Auto Grow | App01 | App01_log | 9/26/2017 0:35 | .Net SqlClient Data Provider | HostServer01 | App01admin |
Log File Auto Grow | App01 | App01_log | 9/26/2017 0:35 | .Net SqlClient Data Provider | HostServer01 | App01admin |
Log File Auto Grow | App01 | App01_log | 9/26/2017 0:35 | NULL | NULL | sa |
Log File Auto Grow | App01 | App01_log | 9/26/2017 0:34 | .Net SqlClient Data Provider | HostServer01 | App01admin |
Log File Auto Grow | App01 | App01_log | 9/26/2017 0:33 | .Net SqlClient Data Provider | HostServer01 | App01admin |
Log File Auto Grow | App01 | App01_log | 9/26/2017 0:33 | .Net SqlClient Data Provider | HostServer01 | App01admin |
Log File Auto Grow | App01 | App01_log | 9/26/2017 0:32 | .Net SqlClient Data Provider | HostServer01 | App01admin |
Log File Auto Grow | App01 | App01_log | 9/26/2017 0:32 | .Net SqlClient Data Provider | HostServer01 | App01admin |
Log File Auto Grow | App01 | App01_log | 9/26/2017 0:31 | .Net SqlClient Data Provider | HostServer01 | App01admin |
Log File Auto Grow | App01 | App01_log | 9/26/2017 0:31 | .Net SqlClient Data Provider | HostServer01 | App01admin |
Log File Auto Grow | App01 | App01_log | 9/26/2017 0:31 | .Net SqlClient Data Provider | HostServer01 | App01admin |
Log File Auto Grow | App01 | App01_log | 9/26/2017 0:30 | .Net SqlClient Data Provider | HostServer01 | App01admin |
Log File Auto Grow | App01 | App01_log | 9/26/2017 0:29 | .Net SqlClient Data Provider | HostServer01 | App01admin |
Log File Auto Grow | App01 | App01_log | 9/26/2017 0:29 | .Net SqlClient Data Provider | HostServer01 | App01admin |
Log File Auto Grow | App01 | App01_log | 9/26/2017 0:28 | .Net SqlClient Data Provider | HostServer01 | App01admin |
Log File Auto Grow | App01 | App01_log | 9/26/2017 0:27 | .Net SqlClient Data Provider | HostServer01 | App01admin |
Log File Auto Grow | App01 | App01_log | 9/26/2017 0:26 | .Net SqlClient Data Provider | HostServer01 | App01admin |
Log File Auto Grow | App01 | App01_log | 9/26/2017 0:26 | .Net SqlClient Data Provider | HostServer01 | App01admin |
Log File Auto Grow | App01 | App01_log | 9/26/2017 0:25 | .Net SqlClient Data Provider | HostServer01 | App01admin |
Log File Auto Grow | App01 | App01_log | 9/26/2017 0:25 | .Net SqlClient Data Provider | HostServer01 | App01admin |
Log File Auto Grow | App01 | App01_log | 9/26/2017 0:24 | .Net SqlClient Data Provider | HostServer01 | App01admin |
Log File Auto Grow | App01 | App01_log | 9/26/2017 0:24 | .Net SqlClient Data Provider | HostServer01 | App01admin |
Log File Auto Grow | App01 | App01_log | 9/26/2017 0:23 | .Net SqlClient Data Provider | HostServer01 | App01admin |
Log File Auto Grow | App01 | App01_log | 9/26/2017 0:22 | .Net SqlClient Data Provider | HostServer01 | App01admin |
Log File Auto Grow | App01 | App01_log | 9/26/2017 0:20 | .Net SqlClient Data Provider | HostServer01 | App01admin |
Log File Auto Grow | App01 | App01_log | 9/26/2017 0:19 | .Net SqlClient Data Provider | HostServer01 | App01admin |
Log File Auto Grow | App01 | App01_log | 9/26/2017 0:18 | .Net SqlClient Data Provider | HostServer01 | App01admin |
Log File Auto Grow | App01 | App01_log | 9/26/2017 0:16 | .Net SqlClient Data Provider | HostServer01 | App01admin |
Log File Auto Grow | App01 | App01_log | 9/26/2017 0:15 | .Net SqlClient Data Provider | HostServer01 | App01admin |
Log File Auto Grow | App01 | App01_log | 9/26/2017 0:14 | .Net SqlClient Data Provider | HostServer01 | App01admin |
Log File Auto Grow | App01 | App01_log | 9/26/2017 0:14 | .Net SqlClient Data Provider | HostServer01 | App01admin |
Log File Auto Grow | App01 | App01_log | 9/26/2017 0:13 | .Net SqlClient Data Provider | HostServer01 | App01admin |
Log File Auto Grow | App01 | App01_log | 9/26/2017 0:12 | .Net SqlClient Data Provider | HostServer01 | App01admin |
Log File Auto Grow | App01 | App01_log | 9/26/2017 0:11 | .Net SqlClient Data Provider | HostServer01 | App01admin |
Log File Auto Grow | App01 | App01_log | 9/26/2017 0:10 | .Net SqlClient Data Provider | HostServer01 | App01admin |
Log File Auto Grow | App01 | App01_log | 9/26/2017 0:09 | .Net SqlClient Data Provider | HostServer01 | App01admin |
Log File Auto Grow | App01 | App01_log | 9/26/2017 0:08 | .Net SqlClient Data Provider | HostServer01 | App01admin |
Log File Auto Grow | App01 | App01_log | 9/26/2017 0:07 | .Net SqlClient Data Provider | HostServer01 | App01admin |
Log File Auto Grow | App01 | App01_log | 9/26/2017 0:06 | .Net SqlClient Data Provider | HostServer01 | App01admin |
Log File Auto Grow | App01 | App01_log | 9/26/2017 0:05 | .Net SqlClient Data Provider | HostServer01 | App01admin |
Log File Auto Grow | App01 | App01_log | 9/26/2017 0:04 | .Net SqlClient Data Provider | HostServer01 | App01admin |
Log File Auto Grow | App01 | App01_log | 9/26/2017 0:04 | .Net SqlClient Data Provider | HostServer01 | App01admin |
Log File Auto Grow | App01 | App01_log | 9/26/2017 0:03 | .Net SqlClient Data Provider | HostServer01 | App01admin |
There was a single growth caused by SQLCMD on
DBSQL– this was the index maintenance job running directly on the SQL Server. Almost all of the other growths were triggered by a.NET
application on application server HostServer01. The App01 database's LDF/log file was set to autogrow in increments of 512MB so each
row in the table above represents a single instance of that growth – as you can
see the unit of work was generating 512MB of work every 30-60 seconds, which is
a significant load.
The next step was setting up a basic XEvents session to track what statement(s) might be generating the load - sure enough there was a serious smoking gun:
Check out the row near the middle with a very large duration, logical_reads count, *and* writes count.
Durations in SQL 2012 Extended Events are in microseconds – a very small unit.
The catch was the duration for this statement was 6,349,645,772 (6 billion) microseconds…105.83 minutes for this one query!
--
The App01 database was in SIMPLE recovery (when I find this in the wild, I always question it – isn’t point-in-time recovery important? – but that is another discussion). The relevance here is that in SIMPLE recovery, LOG backups are irrelevant (and actually *can’t* be run) – once a transaction or batch completes, the LDF/LOG file space is marked available for re-use, meaning that *usually* the LDF file doesn’t grow very large.
A database in SIMPLE recovery growing large LDF/LOG files almost always means a long-running unit of work or accidentally open transactions (a BEGIN TRAN with no subsequent CLOSE TRAN) – looking at the errors in the SQL Error Log the previous night, the 9002 log file full errors stopped at 1:45:51am server time, which means the offending unit of work ended then one way or another (crash or success).
Sure enough when I filtered the XEvents event file to things with a duration > 100 microseconds and then scanned down to 1:45:00 I quickly saw the row shown above. Note this doesn’t mean the unit of work was excessively large in CPU/RAM/IO/etc. (and in fact the query errored out due to lack of LOG space) but the excessive duration made all of the tiny units of work over the previous 105 minutes have to persist in the transaction LDF/LOG file until this unit of work completed, preventing all of the LOG from that time to be marked for re-use until this statement ended.
--
The query in question was this:
--
--
Stripping off the sp_executesql wrapper and the parameter replacement turned it into this:
--
Checking out App01.dbo.SecurityAction, the table was 13GB with 14 million rows. Looking at a random TOP 1000 rows I saw some rows that would satisfy this query that were at least 7 months old at the time of this incident (from September 2017) and there may have been even older rows – I didn’t want to spend the resources to run an ORDER BY query to find the absolute oldest row. I didn’t know based on this evidence whether this meant this individual statement has been failing for a long time or maybe this DELETE statement had been a recent addition to the overnight processes and had been failing since then.
The DELETE looked like it must be a scheduled/regular operation from a .NET application running on HostServer01 since we were seeing it each night in a similar time window.
The immediate term action I recommended to run a batched DELETE to purge this table down – once it had been purged down maybe the nightly operation would run with a short enough duration to not break the universe, a;though I did advise them that changing the nightly process to be batched wouldn't hurt (although they were dealing with a vendor app).
The code I suggested (and we eventually went with) looked like this:
--
--
This would delete rows in 10,000 row batches; wrapping it in the WHILE 1=1 allows the batch to run over and over without ending. Once there are no more rows to satisfy the criteria the query *WILL CONTINUE TO RUN* but will simply delete -0- rows each time.
Once you are ready to stop the batch you simply cancel it and it will end in whatever iteration of the loop it is currently in.
I could have written a WHILE clause that actually checks for the existence of rows that meet the criteria, but there were so many rows that met the criteria right then that running the check itself would have been much more intensive than simply allowing the DELETE to run over and over.
We finally ended up going with this code and after a half day of running it had the table purged down from 14 million rows to 700,000, and a run the next day of the unmodified "regular" DELETE statement completed in under a minute.
--
Durations in SQL 2012 Extended Events are in microseconds – a very small unit.
The catch was the duration for this statement was 6,349,645,772 (6 billion) microseconds…105.83 minutes for this one query!
--
The App01 database was in SIMPLE recovery (when I find this in the wild, I always question it – isn’t point-in-time recovery important? – but that is another discussion). The relevance here is that in SIMPLE recovery, LOG backups are irrelevant (and actually *can’t* be run) – once a transaction or batch completes, the LDF/LOG file space is marked available for re-use, meaning that *usually* the LDF file doesn’t grow very large.
A database in SIMPLE recovery growing large LDF/LOG files almost always means a long-running unit of work or accidentally open transactions (a BEGIN TRAN with no subsequent CLOSE TRAN) – looking at the errors in the SQL Error Log the previous night, the 9002 log file full errors stopped at 1:45:51am server time, which means the offending unit of work ended then one way or another (crash or success).
Sure enough when I filtered the XEvents event file to things with a duration > 100 microseconds and then scanned down to 1:45:00 I quickly saw the row shown above. Note this doesn’t mean the unit of work was excessively large in CPU/RAM/IO/etc. (and in fact the query errored out due to lack of LOG space) but the excessive duration made all of the tiny units of work over the previous 105 minutes have to persist in the transaction LDF/LOG file until this unit of work completed, preventing all of the LOG from that time to be marked for re-use until this statement ended.
--
The query in question was this:
--
exec sp_executesql N'DELETE from SecurityAction WHERE ActionDate < @1 AND (ActionCode != 102 AND ActionCode != 103 AND ActionCode != 129 AND ActionCode != 130)',N'@1 datetime',@1='2017-09-14 00:00:00'
--
Stripping off the sp_executesql wrapper and the parameter replacement turned it into this:
--
DELETE from SecurityAction
WHERE ActionDate < '2017-09-14 00:00:00'
AND
(
ActionCode != 102
AND ActionCode != 103
AND ActionCode != 129
AND ActionCode != 130
)
--
Checking out App01.dbo.SecurityAction, the table was 13GB with 14 million rows. Looking at a random TOP 1000 rows I saw some rows that would satisfy this query that were at least 7 months old at the time of this incident (from September 2017) and there may have been even older rows – I didn’t want to spend the resources to run an ORDER BY query to find the absolute oldest row. I didn’t know based on this evidence whether this meant this individual statement has been failing for a long time or maybe this DELETE statement had been a recent addition to the overnight processes and had been failing since then.
The DELETE looked like it must be a scheduled/regular operation from a .NET application running on HostServer01 since we were seeing it each night in a similar time window.
The immediate term action I recommended to run a batched DELETE to purge this table down – once it had been purged down maybe the nightly operation would run with a short enough duration to not break the universe, a;though I did advise them that changing the nightly process to be batched wouldn't hurt (although they were dealing with a vendor app).
The code I suggested (and we eventually went with) looked like this:
--
WHILE 1=1
BEGIN
DELETE TOP (10000) from SecurityAction
WHERE ActionDate < '2017-09-14 00:00:00'
AND
(
ActionCode != 102
AND ActionCode != 103
AND ActionCode != 129
AND ActionCode != 130
)
END
--
This would delete rows in 10,000 row batches; wrapping it in the WHILE 1=1 allows the batch to run over and over without ending. Once there are no more rows to satisfy the criteria the query *WILL CONTINUE TO RUN* but will simply delete -0- rows each time.
Once you are ready to stop the batch you simply cancel it and it will end in whatever iteration of the loop it is currently in.
I could have written a WHILE clause that actually checks for the existence of rows that meet the criteria, but there were so many rows that met the criteria right then that running the check itself would have been much more intensive than simply allowing the DELETE to run over and over.
We finally ended up going with this code and after a half day of running it had the table purged down from 14 million rows to 700,000, and a run the next day of the unmodified "regular" DELETE statement completed in under a minute.
--
This seems like a very simple fix, and runs contrary to some database teaching that say batches (and their cousins, the cursors) are evil and should be avoided no matter what. Many blogs, classes, etc. preach the value of set-based operations at all times.
This is where we see the standard DBA trope once again:
https://memegenerator.net/img/instances/58097633/what-if-i-told-you-it-depends.jpg |
Always use the right tool for the job - and in a case like this the right tool was a batch-based solution. You could tune the size of the batches (is 10,000 the best number? Maybe a 100,000 row batch (or a 1,000 row batch) would have the best performance) but regardless of that a small batch was a better choice than a 13 million row set!
At the end of the day this seems very simple - but you would be amazed to see how often something like this comes up.
To minimize LOG space needed, take small bites!
Hope this helps!
At the end of the day this seems very simple - but you would be amazed to see how often something like this comes up.
To minimize LOG space needed, take small bites!
http://www.nicoleconner.com.au/wp-content/uploads/2016/04/66247903.jpg |
Hope this helps!
You write that running the check to see if rows qualify would be more expensive then just letting it run. However, why not add IF @@ROWCOUNT = 0 BREAK within the loop so that it stops as soon as it doesn't find any more rows to delete?
ReplyDelete