One DBA's Ongoing Search for Clarity in the Middle of Nowhere


*or*

Yet Another Andy Writing About SQL Server

Thursday, October 5, 2017

Toolbox - Fix Your FILEGROWTH

One of the items we usually flag in reports is FILEGROWTH by percentage.  It doesn't help that for most versions of SQL Server the default FILEGROWTH increment for one or both files (DATA and LOG) is 10%.

https://i.imgflip.com/jaq5c.jpg

As you probably already know, the key flaw to percentage-based FILEGROWTH is that over time the increment grows larger and larger, causing the actual growth itself to take longer and longer.  This is especially an issue with LOG files because they have to be zero-initialized before they can be used, causing excessive I/O and file contention while the growth is in progress.  Paul Randal (blog/@PaulRandal) describes why this is the case in this blog post.  (If you ever get a chance to see it Paul also does a fun demo in some of his classes and talks on why zero initialization is importan, using a hex editor to read the underlying contents of disk even after the old data is "deleted")

As I mentioned above the catch to percentage-based growth is the ever-growing increment:

    

In the image on the left you can see that after 20 growths at 10% you are now growing at 313MB at a time.  By 30 growths (not pictured) the increment is 812MB - getting close to the 1GB mark.  Depending on the speed of your storage this can cause significant delay.

Another related issue is Virtual Log Files (VLF's) which I discuss here.  It is important to have a meaningful increment so that the growth isn't *too* small relative to the file size.

How do we fix this?  I found a script (and modified it of course) to generate the ALTER DATABASE statements to set the FILEGROWTH increment so a size fair to the individual file's current size, based on a table shown in the script:

--

/*

FILEGROWTH Reset
Submitted by Andy Galbraith
02/04/2016

Script to reset all FILEGROWTH to a fixed increment based on their current size:
CurrentSize<1GB = 16MB
1GB<=CurrentSize<5GB = 128MB
5GB<=CurrentSize<100GB = 256MB
CurrentSize>=100GB = 512MB

Actual queries are about two-thirds of the way down at 'SET @Query' if you want to modify the size parameters

Modified from a script at http://www.sqlservercentral.com/scripts/Administration/99339/

Tested on MSSQL 2005/2008/2008R2/2012/2014

*/

SET NOCOUNT ON
USE master
GO

/* Create a Table for Database File Info */
IF OBJECT_ID('tempdb..#ConfigAutoGrowth') IS NOT NULL 
DROP TABLE #ConfigAutoGrowth

CREATE TABLE #ConfigAutoGrowth
(
DatabaseID INT,
DBName SYSNAME,
LogicalFileName VARCHAR(max),
FileSizeinGB decimal(10,2),
GrowthOption VARCHAR(12)
)

/* Load the Database File Table */
INSERT INTO #ConfigAutoGrowth
SELECT 
SD.database_id, 
SD.name,
SF.name, 
sf.size*8/1024.0/1024.0 as FileSizeinGB, 
CASE SF.status & 0x100000
WHEN 1048576 THEN 'Percentage'
WHEN 0 THEN 'MB'
END AS 'GROWTH Option'
FROM SYS.SYSALTFILES SF
JOIN SYS.DATABASES SD
ON SD.database_id = SF.dbid

/* Variable and Cursor Declarations */ 
DECLARE @name VARCHAR ( max ) /* Database Name */
DECLARE @DatabaseID INT /* Database ID */
DECLARE @LogicalFileName VARCHAR ( max ) /* Database Logical file name */
DECLARE @FileSizeinGB DECIMAL(10,2) /* Current File Size in GB */
DECLARE @GrowthOption VARCHAR ( max ) /* Current FILEGROWTH Type */
DECLARE @Query VARCHAR(max) /* Dynamic Query */

DECLARE DBCursor CURSOR FOR
SELECT DatabaseID, DBName, LogicalFileName, FileSizeinGB, GrowthOption
FROM #ConfigAutoGrowth

OPEN DBCursor

FETCH NEXT FROM DBCursor 
INTO @DatabaseID,@name,@LogicalFileName,@FileSizeinGB, @GrowthOption

WHILE @@FETCH_STATUS = 0
BEGIN
PRINT 'Changing AutoGrowth option for database ['+ UPPER(@name) +'] - current file size ' + cast (@FileSizeinGB as varchar)+'GB'

IF @FileSizeinGB<1 
SET @Query= 'ALTER DATABASE ['+ @name +'] MODIFY FILE (NAME = ['+@LogicalFileName+'],FILEGROWTH = 16MB)'--,MAXSIZE=UNLIMITED)'
IF @FileSizeinGB>=1 and @FileSizeinGB<5 
SET @Query= 'ALTER DATABASE ['+ @name +'] MODIFY FILE (NAME = ['+@LogicalFileName+'],FILEGROWTH = 128MB)'--,MAXSIZE=UNLIMITED)'
IF @FileSizeinGB>=5 and @FileSizeinGB <100
SET @Query= 'ALTER DATABASE ['+ @name +'] MODIFY FILE (NAME = ['+@LogicalFileName+'],FILEGROWTH = 256MB)'--,MAXSIZE=UNLIMITED)'
IF @FileSizeinGB>=100
SET @Query= 'ALTER DATABASE ['+ @name +'] MODIFY FILE (NAME = ['+@LogicalFileName+'],FILEGROWTH = 512MB)'--,MAXSIZE=UNLIMITED)'

PRINT @Query
--EXECUTE(@Query)

FETCH NEXT FROM DBCursor 
INTO @DatabaseID,@name,@LogicalFileName,@FileSizeinGB,@GrowthOption
END 

CLOSE DBCursor 
DEALLOCATE DBCursor 

DROP TABLE #ConfigAutoGrowth
GO

--

SELECT   
    SD.database_id,   
    SD.name,  
    SF.name,  
    CASE SF.status & 0x100000  
    WHEN 1048576 THEN 'Percentage'  
    WHEN 0 THEN 'MB'  
    END AS 'GROWTH Option' 
,size*8.0/1024 as SizeinMB
,growth*8.0/1024.0 as Growth 
FROM SYS.SYSALTFILES SF  
JOIN   
SYS.DATABASES SD  
ON   
SD.database_id = SF.dbid  
GO 

--

By default the script simply PRINT's out the ALTER DATABASE statements for you to copy-paste to another window and execute, but you can un-comment out the "EXECUTE (@Query)" statement and the script will automatically enact the changes.

Hope this helps!


Thursday, September 14, 2017

Come See Me in Minnesota in October!




The schedule for SQL Saturday #682 Minnesota 2017 is out, and I will be giving my talk on Extended Events:

--
Getting Started with Extended Events
Speaker: Andy Galbraith
Duration: 60 minutes
Track: Administration
Few subjects in Microsoft SQL Server inspire the same amount of Fear, Uncertainty, and Doubt (FUD) as Extended Events.  Many DBA's continue to use Profiler and SQL Trace even though they have been deprecated for years.  Why is this?
Extended Events started out in SQL Server 2008 with no user interface and only a few voices in the community documenting the features as they found them.  Since then it has blossomed into a full feature of SQL Server and an amazingly low-impact replacement for Profiler and Trace.
Come learn how to get started - the basics of sessions, events, actions, targets, packages, and more.  We will look at some base scenarios where Extended Events can be very useful as well as considering a few gotchas along the way.  You may never go back to Profiler again!
--

The event is October 7th at St Paul College, and is *free* (with an optional $15 lunch).

I have spoken at numerous SQL Saturdays over the last few years and can definitely say the crew in Minnesota does a great job assembling a schedule and taking care of the attendees (and speakers!)

There is also a great slate of full day pre-conference sessions on Friday the 6th:

--
The Complete Primer to SQL Server Infrastructure and Cloud
David Klee
Level: Intermediate
--
Powering Up with Power BI
Brian Larson, John Thompson...
Level: Beginner
--
DBA Fundamentals - give yourself a solid SQL foundation!
Kevin Hill
Level: Beginner
--
Azure Data Services Hands On Lab
Matt Stenzel
Level: Intermediate
--

The pre-cons cost $110 which makes them a great value for a full day of training from some great SQL Server professionals!

Hope to see you there!


https://media.tenor.com/images/aa1e9033238489da177267cb6a8af273/tenor.gif

Wednesday, September 6, 2017

Toolbox - IO, IO, Why are You So Slow?

This is the next in a series of blogs I am going to create talking about useful tools (mostly scripts) that I use frequently in my day-to-day life as a production DBA.  I work as a Work-From-Home DBA for a Remote DBA company, but 90+% of my job functions are the same as any other company DBA.

Many of these scripts come directly from blogs and articles created and shared by other members of the SQL Server community; some of them I have slightly modified and some I have lifted directly from those articles.  I will always give attribution back to the original source and note when I have made modifications.

--

Troubleshooting performance in SQL Server (or almost any other system) is often an iterative process of discovering a bottleneck, fixing it, and then discovering the next bottleneck.

https://memegenerator.net/instance/65795220/grumpy-cat-5-bottleneck-sighting-optimize-the-system
SQL Server bottlenecks frequently fall into one of three resource categories:
  • CPU
  • Memory
  • I/O (Disk)
For example, you may find that "your instance is slow" (Client-speak for "I'm unhappy - FIX IT!") and upon investigation you see in your Perfmon collector (you have a Perfmon collector running, right???) that CPU is consistently at 90%+.  You consider the evidence and realize a set of queries is poorly written (darn SharePoint) but you also realize that in this case you can fix them (whew!), and this helps the CPU numbers drop significantly.  You're the hero, right?

...until you realize the instance is still "slow."

https://hakanforss.files.wordpress.com/2014/08/lawofbottlenecks.png
You look at the numbers again, and now you find that disk latency, which had previously been fine, is now completely in the tank during the business day, showing that I/O delays are through the roof.

What happened?

This demonstrates the concept of shifting bottleneck - while CPU use was through the roof, the engine so bogged down that it couldn't generate that much I/O, but once the CPU issue was resolved queries started moving through more quickly until the next choke point was met at the I/O limit.  Odds are once you resolve the I/O situation, you would find a new bottleneck.

How do you ever defeat a bad guy that constantly moves around and frequently changes form?

https://movietalkexpress.files.wordpress.com/2016/08/3.jpg
The next concept in this series is the concept of "good enough" - all computing systems have theoretical top speeds (speed of electrons through the ether, etc.) but you never get there.  At the end of the day the important question is:

How fast does your system need to be to meet the business SLAs and keep your customers happy (aka QUIET)?

Unfortunately in many cases the answer seems to be this:

https://i.kinja-img.com/gawker-media/image/upload/s--Wtfu6LxV--/c_fill,fl_progressive,g_center,h_450,q_80,w_800/1344998898279631762.jpg
--

Once you can determine this "good enough" speed (the whole discussion of  "Do you have SLAs?" could fill dozens of blog posts) you can determine where an acceptable end point is to your iterative process.

--

I have a small handful of things I check when a system is "slow" and one of them is disk latency.  At its core, disk latency is a calculation of how much delay there is between when SQL Server asks for data and when it gets it back.

The data regarding latency comes from the DMV sys.dm_io_virtual_file_stats, and the relevant metrics are io_stall, io_stall_read_ms, and io_stall_write_ms (table from Technet):

Column name Data type Description
database_id smallint ID of database.
file_id smallint ID of file.
sample_ms int Number of milliseconds since the computer was started. This column can be used to compare different outputs from this function.
num_of_reads bigint Number of reads issued on the file.
num_of_bytes_read bigint Total number of bytes read on this file.
io_stall_read_ms bigint Total time, in milliseconds, that the users waited for reads issued on the file.
num_of_writes bigint Number of writes made on this file.
num_of_bytes_written bigint Total number of bytes written to the file.
io_stall_write_ms bigint Total time, in milliseconds, that users waited for writes to be completed on the file.
io_stall bigint Total time, in milliseconds, that users waited for I/O to be completed on the file.
size_on_disk_bytes bigint Number of bytes used on the disk for this file. For sparse files, this number is the actual number of bytes on the disk that are used for database snapshots.
file_handle varbinary Windows file handle for this file.

How do you turn these numbers into something meaningful?  As is almost always the case - someone has already done that work - you just need to find it!

Paul Randal (blog/@PaulRandal) has a great blog post "How to examine IO subsystem latencies from within SQL Server" and in that post he references a query created by Jimmy May (blog/@aspiringgeek).  Jimmy was at Microsoft for many years and is currently at the SanDisk Data Propulsion Lab.

Jimmy's query is so useful that it is also included in Glenn Berry's (blog/@glennalanberry) great Diagnostic Information (DMV) Queries.  Here is my *very slightly* modified version of that query:

--
-- Drive level latency information (Query 19) (Drive Level Latency)
-- Based on code from Jimmy May
SELECT @@ServerName as instanceName, [Drive], volume_mount_point,
CASE
WHEN num_of_reads = 0 THEN 0
ELSE (io_stall_read_ms/num_of_reads)
END AS [Read Latency],
CASE
WHEN io_stall_write_ms = 0 THEN 0
ELSE (io_stall_write_ms/num_of_writes)
END AS [Write Latency],
CASE
WHEN (num_of_reads = 0 AND num_of_writes = 0) THEN 0
ELSE (io_stall/(num_of_reads + num_of_writes))
END AS [Overall Latency],
CASE
WHEN num_of_reads = 0 THEN 0
ELSE (num_of_bytes_read/num_of_reads)
END AS [Avg Bytes/Read],
CASE
WHEN io_stall_write_ms = 0 THEN 0
ELSE (num_of_bytes_written/num_of_writes)
END AS [Avg Bytes/Write],
CASE
WHEN (num_of_reads = 0 AND num_of_writes = 0) THEN 0
ELSE ((num_of_bytes_read + num_of_bytes_written)/(num_of_reads + num_of_writes))
END AS [Avg Bytes/Transfer]
FROM (SELECT LEFT(UPPER(mf.physical_name), 2) AS Drive, UPPER (volume_mount_point) as volume_mount_point
,SUM(num_of_reads) AS num_of_reads,
        SUM(io_stall_read_ms) AS io_stall_read_ms, SUM(num_of_writes) AS num_of_writes,
        SUM(io_stall_write_ms) AS io_stall_write_ms, SUM(num_of_bytes_read) AS num_of_bytes_read,
        SUM(num_of_bytes_written) AS num_of_bytes_written, SUM(io_stall) AS io_stall
      FROM sys.dm_io_virtual_file_stats(NULL, NULL) AS vfs
      INNER JOIN sys.master_files AS mf WITH (NOLOCK)
      ON vfs.database_id = mf.database_id AND vfs.file_id = mf.file_id
 CROSS APPLY sys.dm_os_volume_stats(mf.database_id, mf.FILE_ID)
      GROUP BY LEFT(UPPER(mf.physical_name), 2),UPPER (volume_mount_point)) AS tab
ORDER BY [Overall Latency] DESC OPTION (RECOMPILE);
-- Shows you the drive-level latency for reads and writes, in milliseconds
-- Latency above 20-25ms is usually a problem
--

The query performs a calculation comparing the stall in milliseconds to the number of disk operations to create a latency in milliseconds.  A rule of thumb (as noted in the query comments) is that anything >=20ms in latency is pretty bad and worthy of investigation.

Here is a sample output of the query:

--

instanceName Drive volume_mount_point Read Latency Write Latency Overall Latency Avg Bytes/ Read Avg Bytes/ Write Avg Bytes/ Transfer
INSTANCE35 F: F:\SPARE-01\ 69 7 64 1002982 236026 940840
INSTANCE35 F: F:\DATA01\ 26 1 25 55005 2629 54384
INSTANCE35 F: F:\280532-02_SECOND\ 8 5 8 105124 8257 105114
INSTANCE35 T: T:\ 2 25 3 65483 64899 65187
INSTANCE35 E: E:\LOGS01\ 7 0 0 20080 51327 51198

--

On this instance you can see that the SPARE-01 and DATA01 mount points on the F: drive have significant overall latency (64ms and 25ms respectively) - significant enough that the users are almost certainly experiencing I/O-related impact.

If you look at the query you will see that the "overall" latency covers both reads and writes and as such is functionally a weighted average of the read and write latencies.  For example you can see that the DATA01 does significantly more reads that writes since the overall latency of 25 is almost equal to the read latency of 26.

One more point from the sample numbers above - look at the T: drive.  The write latency is 25 (bad) but the overall latency is only 3.  Is this a problem?  Usually it is not, because this overall latency shows that this drive does relatively few writes - there are so many reads comparatively that the good read latency heavily outweighs the bad write latency.  In most cases the overall latency is the most important statistic of the three.

(Of course as always this is a big #ItDepends - if you have a known issue with write performance - even for the small number of writes being done on the drive - then the write latency is important!)

--

What causes latency?  There are a wide variety of issues - some inside SQL but many outside the engine as well.  In my experience I usually follow-up on both sides, tasking infrastructure teams to investigate numbers on the storage back-end *and* the storage network while the DBA investigates SQL Server-side I/O-related issues such as poor query plans and missing or bad indexes that cause excessive unnecessary I/O.  Sometimes you will read guidance to investigate SQL first (Paul makes that recommendation in his post referenced above) but I like starting both investigations at once so that you can get both sets of results in a more timely fashion.

Disk Latency is one of the easiest things to find - run the single simple query shown above and look at the numbers.  Once you determine that you *do* have a latency issue, you can move forward with internal SQL Server and external infrastructure investigations.

--

Remember - at the end of the day, you want your customer to think this:

http://imgur.com/waLAsxg

--

Hope this helps!


Thursday, August 24, 2017

The Transient Database Snapshot Has Been Marked Suspect

Yet another tale from the ticket queue...

The DBCC CheckDB was failing on INSTANCE99 and after some investigation it looked like a space issue, not an actual corruption issue.

http://baddogneedsrottenhome.com/images/emails/55ce060daa58b.jpg
--

The Job Failure error text was this:

--

Executed as user: DOMAIN\svc_acct. Microsoft (R) SQL Server Execute Package Utility  Version 10.50.6000.34 for 64-bit  Copyright (C) Microsoft Corporation 2010. All rights reserved.    Started:  2:00:00 AM  Progress: 2017-08-20 02:00:01.11     Source: {11E1AA7B-A7AC-4043-916B-DC6EABFF772B}      Executing query "DECLARE @Guid UNIQUEIDENTIFIER      EXECUTE msdb..sp...".: 100% complete  End Progress  Progress: 2017-08-20 02:00:01.30     Source: Check Database Integrity Task      Executing query "USE [VLDB01]  ".: 50% complete  End Progress  Error: 2017-08-20 03:38:19.28     Code: 0xC002F210     Source: Check Database Integrity Task Execute SQL Task     Description: Executing the query "DBCC CHECKDB(N'VLDB01')  WITH NO_INFOMSGS  " failed with the following error: "Check terminated.  The transient database snapshot for database 'VLDB01' (database ID 5) has been marked suspect due to an IO operation failure.  Refer to the SQL Server error log for details.  A severe error occurred on the current command.  The results, if any, should be discarded.". Possible failure reasons: Problems with the query, "ResultSet" property not set correctly, parameters not set correctly, or connection not established correctly.  End Error  Warning: 2017-08-20 03:38:19.28     Code: 0x80019002     Source: VLDB01 Integrity      Description: SSIS Warning Code DTS_W_MAXIMUMERRORCOUNTREACHED.  The Execution method succeeded, but the number of errors raised (1) reached the maximum allowed (1); resulting in failure. This occurs when the number of errors reaches the number specified in MaximumErrorCount. Change the MaximumErrorCount or fix the errors.  End Warning  DTExec: The package execution returned DTSER_FAILURE (1).  Started:  2:00:00 AM  Finished: 3:38:19 AM  Elapsed:  5899.51 seconds.  The package execution failed.  The step failed.

--

Looking in the SQL Error Log there were hundreds of these combinations in the minutes immediately preceding the job failure:

--

The operating system returned error 665(The requested operation could not be completed due to a file system limitation) to SQL Server during a write at offset 0x000048a123e000 in file 'E:\SQL_Data\VLDB01.mdf:MSSQL_DBCC17'. Additional messages in the SQL Server error log and system event log may provide more detail. This is a severe system-level error condition that threatens database integrity and must be corrected immediately. Complete a full database consistency check (DBCC CHECKDB). This error can be caused by many factors; for more information, see SQL Server Books Online.

--

Error: 17053, Severity: 16, State: 1.

--

E:\SQL_Data\VLDB01.mdf:MSSQL_DBCC17: Operating system error 665(The requested operation could not be completed due to a file system limitation) encountered.

--

I have seen DBCC snapshot errors in the past and they almost always come back to disk space issues.  If you look at the first listing of the 665 error above you can see it was trying to write to the snapshot file it was creating on the E: drive, which is where the primary DATA/MDF file for VLDB01 was located.

By default, CheckDB and its component commands use a snapshot of the database to perform their work.  As described here by Paul Randal (@PaulRandal/blog) from SQLskills:  http://sqlmag.com/blog/why-can-database-snapshot-run-out-space, snapshot files are “sparse” files that reserve a very small amount of space and then grow as needed to handle the required data.  Because of this mechanism, they do not require the full amount of space up front.

https://technet.microsoft.com/en-us/library/bb457112.f13zs11_big(l=en-us).jpg

A sparse file only uses the physical space required to hold the actual ("meaningful") data.  As seen in this diagram from Technet, in this example a regular/non-sparse file would be 17GB while a comparable sparse file would only be 7GB.

The text of the out of space error has since been updated from the error message seen in Paul’s article to the “transient database snapshot suspect” error we see above as described here http://www.sqlcoffee.com/Troubleshooting177.htm.

--

Looking at the E: drive it was a 900GB drive with 112GB currently free.  The catch is that in the 675GB VLDB01 database there are two tables larger than 112GB and another that is almost 100GB!

Top 10 largest tables out of 1261 total tables in VLDB01:

InstanceName
DatabaseName
TableName
NumberOfRows
SizeinMB
DataSizeinMB
IndexSizeinMB
UnusedSizeinMB
INSTANCE99
VLDB01
BigTable1
1011522
136548.20
136523.80
10.71
13.69
INSTANCE99
VLDB01
BigTable2
9805593
122060.29
114534.34
5709.13
1816.82
INSTANCE99
VLDB01
BigTable3
17747326
91143.74
65405.88
25464.23
273.63
INSTANCE99
VLDB01
BigTable4
137138292
78046.15
39646.33
38305.33
94.49
INSTANCE99
VLDB01
Table01
1650232
46884.70
46422.93
419.40
42.37
INSTANCE99
VLDB01
Table02
76827734
26780.02
9153.05
17566.23
60.75
INSTANCE99
VLDB01
Table03
35370640
26766.98
20936.73
5733.40
96.86
INSTANCE99
VLDB01
Table04
12152300
22973.11
11173.06
11764.65
35.40
INSTANCE99
VLDB01
Table05
12604262
19292.02
7743.06
11511.93
37.03
INSTANCE99
VLDB01
Table06
31649960
14715.57
5350.62
9327.30
37.65

The biggest unit of work in a CheckDB is the individual DBCC CHECKTABLE’s of each table, and trying to run a CHECKTABLE of a 133GB table in a 112GB space was not going to fly.

Note that you don’t need 675GB of free space for the CheckDB snapshot of a 675GB database – just space for the largest object and a little more – 145GB-150GB free should be sufficient to CheckDB this particular database as it currently stands, but we need to be mindful of these large tables if they grow over time as they would then require more CheckDB snapshot space as well.

--

There are a couple of potential fixes here.

First and possibly most straightforward would be to clear more space on E: or to expand the drive – if we could get the drive to 150+GB free we should be good for the present (acknowledging the threat of future growth of the large tables).  The catch was that there were only three files on E: and none of them had much useful free space to reclaim:

DBFileName
Path
FileSizeMB
SpaceUsedMB
FreeSpaceMB
VLDB01
E:\SQL_Data\VLDB01.mdf
654267.13
649746.81
4520.31
VLDB01_data2
E:\SQL_Data\VLDB01_1.ndf
29001.31
28892.81
108.5
VLDB01_CONFIG
E:\SQL_Data\VLDB01_CONFIG.mdf
16.25
12.06
4.19

This means that going this route would requiring expanding the E: drive.  I would recommend expanding it by 100GB-150GB – this is more than we immediately need but should prevent us from asking for more space in the short term. 

ProTip - consider this method any time you are asking for additional infrastructure resources – asking for just the amount of CPU/RAM/Disk/whatever that you need right now means you will probably need to ask again soon, and most infra admins I have known would rather give you more up front then have you bother them every month! 

https://imgflip.com/i/1unt0z

(However, be realistic – don’t ask for an insane amount or you will just get shut down completely!)


--

Another option in this case since INSTANCE99 is SQL Server Enterprise Edition would be to create a manual snapshot somewhere else with more space and then to run CheckDB against that manual snapshot.  This process is described here by Microsoft Certified Master Robert Davis (@SQLSoldier/blog): http://www.sqlsoldier.com/wp/sqlserver/day1of31daysofdisasterrecoverydoesdbccautomaticallyuseexistingsnapshot and is relatively straightforward:

--

1)  Create a snapshot of your database on a different drive – something like:

CREATE DATABASE VLDB01_Snapshot ON (NAME = N' VLDB01_Data_Snap', FILENAME = N'O:\Snap\VLDB01_Data.snap') AS SNAPSHOT OF VLDB01;

2)      Run CheckDB against the snapshot directly:

DBCC CHECKDB (VLDB01_Snapshot);

3)      Drop the snapshot – because the snapshot is functionally a database, this is just a DROP DATABASE statement:

DROP DATABASE VLDB01_Snapshot

4)      Modify the existing job to exclude VLDB01 so that it doesn’t continue to try to run with the default internal process!

--

Luckily, in this case there were several drives with sufficient space!

--

I advised the client that if they preferred to go this second way (the manual snapshot) I strongly recommend removing any existing canned maintenance plans and changing this server to the Ola Hallengren scripted maintenance.  Not only is this my general recommendation anyway (#OlaRocks), but it also makes excluding a database much easier and safer. 

To exclude a database under a regular maintenance plan you have to edit the job and manually check every database except the offending database, but this causes trouble when new databases are added to the instance as they must then be manually added to the maintenance plans.  Under the Hallengren scripts you can say “all databases except this one” which continues to automatically pick up new databases in the future (there is no “all but this one” option in a regular maintenance plan).

Here is what the command would look like under Ola:

EXECUTE dbo.DatabaseIntegrityCheck
@Databases = 'USER_DATABASES, -VLDB01',
@CheckCommands = 'CHECKDB'


--

If you find yourself in this situation consider carefully which way you prefer to go and document, document, document so that future DBA’s know what happened (even if that future DBA is just you in 6/12/24 months!)

https://3.bp.blogspot.com/-YBeSt5-A_fA/Vx2SCCBc1xI/AAAAAAAAtnY/UwvtqaQBacoJ7TY5AM3_1HKSUZTy5CyZACLcB/s1600/wait%2Bhere1b.jpg

Hope this helps!