One DBA's Ongoing Search for Clarity in the Middle of Nowhere


*or*

Yet Another Andy Writing About SQL Server

Tuesday, March 25, 2014

Backup Checksums and Trace Flag 3023

One of the items I always flag when performing a health check on a server is whether backups are being run with CHECKSUM enabled.  As Paul Randal (@PaulRandal/blog) writes in his post "Example 2000/2005 corrupt databases and some more info on backup, restore, page checksums and IO errors":

If you have page checksums turned on, you should always use the WITH CHECKSUM option when taking backups. This will cause the page checksums to be checked as they’re read into the backup. If a bad page checksum is found, the backup will stop and print a message identifying the bad page. Using the WITH CHECKSUM option on a backup will also generate a checksum over the entire backup stream and store it in the backup. This means we can detect a damaged backup by recalculating the checksum and comparing it against that stored in the backup – in much the same way that page checksums work.
 As Paul describes, BACKUP WITH CHECKSUM has two advantages:
  • If you have page checksums enabled, BACKUP WITH CHECKSUM will countercheck against the page checksums to verify the pages are valid - this is your first defense against general page corruption.
    • NOTE - page checksums are the default for new databases created in MSSQL 2005+, but this option is not updated when you migrate older databases up to a new version of MSSQL, so check your databases (select name, page_verify_option_desc from master.sys.databases) and then ALTER DATABASE...SET PAGE_VERIFY CHECKSUM to fix it for databases that are currently set to TORN_PAGE_DETECTION or NONE.
  • Even if you don't have page checksums enabled (although you definitely should - if you aren't convinced by me, check out Microsoft Certified Master Kendra Little's (@Kendra_Little) blog post on the topic with some great examples)), BACKUP WITH CHECKSUM generates a checksum for that individual backup which is used to verify the validity of that backup in the future.
This all sounds great, right?

Just add WITH CHECKSUM to your BACKUP statements and everything is great.

Right?

Right?

The fly in the ointment comes when you don't explicitly control the syntax of your BACKUP statements.  The two major examples of this I run into while providing DBA services are:
  • SQL Server Maintenance Plans - no matter how much I evangelize Ola's amazing Maintenance Solution, there are still lots and lots of companies out there using the out-of-the-box Maintenance Plans you design in SQL Server Management Studio.
  • Third Party Backup Products - I'm not talking SAN snapshots and the like, but database-related products like Quest (now Dell) LiteSpeed.  I am not widely familiar with many of these products, but several of them I have run into don't have the option of performing BACKUP with CHECKSUM from within their maintenance plans (although to be fair most of them do offer it from the command line via extended stored procedures).
OK, fine....so I just can't BACKUP WITH CHECKSUM.  

Sigh.

But wait - Trace Flags to the rescue!

Recently I found this article: How to enable the CHECKSUM option if backup utilities do not expose the option which outlines the exact solution to my problem - outstanding!

I have researched this in the past (several years ago) and either my Google-Fu wasn't as strong as it is now (Many years I have honed this craft, young ones) or the last time I looked this article didn't exist. (To be fair to myself, the article is date-stamped February 2012 as revision 1.0, so maybe the article didn't exist!)

Trace flag 3023 causes all backups to be written with CHECKSUM, unless you explicitly call BACKUP WITH NO_CHECKSUM (which isn't the case for any of the examples above).  This trace flag is available in all Microsoft SQL Server versions since 2005.

This trace flag can be enabled on the fly (DBCC TRACEON(3023,-1) where the (-1) switches on the trace flag globally rather than just for your session) which incurs no downtime, or by adding the -T3023 parameter to the MSSQL Server service startup parameters via SQL Server Configuration Manager (on the Advanced tab of the service properties) and then restarting the service (which *does* incur a downtime!)

I would advise that if you intend to start using this, you need to go the startup parameter route since the DBCC TRACEON will only remain in effect until the next time the service restarts - if you want this to persist you need to use the startup parameter.

I am now using this on several servers and sure enough the clients' maintenance plans are writing their backups with CHECKSUM - check this with the following query to verify that new backups are being created with has_backup_checksums = 1:

select database_name, type,
MAX(backup_start_date) as MAX_Backup_Start_Date,
has_backup_checksums
from msdb.dbo.backupset
group by database_name, type, has_backup_checksums
order by database_name,type

One thing I have learned about Microsoft SQL Server is that regardless of your years of experience with the product or any given part of it (clustering, SSIS, etc.), there is always something left to learn - even things that in retrospect seem obvious (well, duh!) were new at some point, and there is always - always - something left out there waiting for you to discover.
 
Hope this helps!

3 comments:

  1. Thanks for sharing this. It's the thing I have been looking for. We use a backup software to backup sql. It was a surprise to me our server guys told me there is no such a check box to enable backup checksum when they set up backup. I then went to find their official technical document. Yeah, they are right. No backup checksum. I am looking to implement this change in our environment after some testing. Cheers for that.

    ReplyDelete
  2. does using checksum effect backup performance ??

    ReplyDelete
    Replies
    1. Calculating backup checksums does cost CPU cycles (http://technet.microsoft.com/en-us/library/ms189055.aspx) so monitor your CPU - in general SQL Servers are not CPU-bound, but if you find it is a problem, consider whether you can add additional CPUs to resolve the situation rather than disabling the checksums (pending licensing of course).

      Thanks for reading and commenting!

      Delete