Nebraska SQL from @DBA

Friday, December 8, 2017

Come to SQL Saturday Providence!

Tomorrow 12/09/2017 is SQL Saturday #694 in Providence, RI.

I will be presenting by SQL Server Health Check session first thing in the morning:

--

Does it Hurt When I Do This? Performing a SQL Server Health Check

Speaker: Andy Galbraith

Duration: 60 minutes

Track: Enterprise Database Administration & Deployment

How often do you review your SQL Servers for basic security, maintenance, and performance issues?  Many of the servers I "inherit" as a managed services provider have quite a few gaping holes. It is not unusual to find databases that are never backed up, servers with constant login failures (is it an attack or a bad connection string?), and servers that need more RAM/CPU/etc. (or sometimes that even have too much!) 

Come learn how to use freely available tools from multiple layers of the SQL Server stack to check your servers for basic issues like missing backups and CheckDB as well as for more advanced issues like page life expectancy problems and improper indexing. If you are responsible in any way for a Microsoft SQL Server (DBA, Windows Admin, even a Developer) you will see value in this session!

--

In total there are six tracks with five individual sessions each, so a full day of content!

Registration is still open at https://www.sqlsaturday.com/694/RegisterNow.aspx - come to Bryant University tomorrow and check it out!

Wednesday, November 1, 2017

PASS Summit 2017 - Live and in Living Color!

UPDATE - the feed and schedule are live at http://www.pass.org/summit/2017/PASStv.aspx

http://rochistory.com/blog/wp-content/uploads/2014/09/NBC-logo.png

Not in Seattle for some reason this week? Like maybe you stayed home because it was Halloween week, or you don't have the funds, or you are preparing for the zombie apocalypse?

http://www.demotivation.us/media/demotivators/demotivation.us_Zombie-Apocalypse-Preparation-level-genius_133338427933.jpg

PASS is ready to help you out - starting today (Wednesday 11/01/2017) at 8:10 PDT (15:10 UTC) PASS will stream the keynotes, some sessions, and certain other round tables and events LIVE online at http://www.pass.org/summit/2017/Home.aspx

~~The schedule for the next few days isn't up yet but will be soon - tune in!~~

https://radiosoundsfamiliar.com/resources/Rebrand/kep%20calm%20and%20tune%20in%20logo.jpg.opt197x337o0%2C0s197x337.jpg

Monday, October 23, 2017

But I Don't Want To "Restore from a backup of the database or repair the database!"

Do you like Log Shipping?

Before my time at Ntirety I had done quite a bit of work with failover clustering and mirroring (trending into some Availability Groups) but I had never really done that much replication or log shipping. Even at my previous managed services remote DBA job I simply hadn't run across it...

I have said in this blog more than once that education (training and real-world) is one of the most important parts of being a DBA, and that we should always be learning; my time at Ntirety has definitely educated me about replication and log shipping!

https://memegenerator.net/img/instances/500x/42421555/be-careful-what-you-wish-for-punk.jpg

This Log Shipping story (maybe its a cautionary tale - so many of these stories are) begins with a failed server migration. We were trying to replace a server with a new VM and due to some issues completely unrelated to log shipping, we ended up reverting to the original server (maybe once we figure that out I will write about it too).

The problem was that I had already dropped the log shipping from OldServer01.Database99 to LSSecondary.Database99 and created new log shipping from NewServer01.Database99 to LSSecondary.Database99. Part of the backout plan was dropping *that* log shipping and recreating the original relationship between OldServer01 and LSSecondary.

I had the creation process scripted out, so I ran the scripts, made sure they succeeded, checked that LSSecondary.Database99 reflected the current data (we had it in Standyby/Read-Only) and then walked away from log shipping to continue the rest of the revert process.

http://vignette1.wikia.nocookie.net/cardfight/images/e/e1/Funniest_Memes_you-my-friend-just-made-a-big-mistake_10940.jpeg/revision/latest?cb=20150805104725

Monday morning rolled around and one of the business users reported that Database99 on LSSecondary was in Restoring rather than Standby, and wanted me to check it out. Sure enough, Database99 was in Restoring even though the LSRestore job wasn't running and there was no active RESTORE process running.

OK....

I check the properties of the log shipping secondary in the Management Studio GUI and they looked as expected:

You can see these same settings via T-SQL with the following query:

SELECT secondary_database
, restore_mode
, disconnect_users
, last_restored_file
FROM msdb.dbo.log_shipping_secondary_databases

Since the Database shows restore_mode 1 (Standby) and disconnect_users 1 (true) there is no reason by the configuration that the database should be in RESTORING.

The next stop was the SQL Server Error Log. This probably should have been my first stop, but since I had just set this log shipping up I thought of the configuration first. The Error Log was a horror show:

--

Error: 3456, Severity: 16, State: 1.

Could not redo log record (31185729:63924:22), for transaction ID (0:0), on page (1:511238), database 'Database99' (database ID 5). Page: LSN = (31185729:63857:14), type = 16. Log: OpCode = 7, context 24, PrevPageLSN: (31185729:63804:511). Restore from a backup of the database, or repair the database

…followed by 100+ lines of Error Stack Dump <cue scary music>:

Using 'dbghelp.dll' version '4.0.5'

**Dump thread - spid = 0, EC = 0x00000002E61A20F0

***Stack Dump being sent to F:\Program Files\Microsoft SQL Server\MSSQL10_50.MSSQLSERVER\MSSQL\LOG\SQLDump0001.txt

* *******************************************************************************

* BEGIN STACK DUMP:

* 10/22/17 03:00:34 spid 55

* HandleAndNoteToErrorlog: Exception raised, major=34, minor=56, severity=16

* Input Buffer 506 bytes -

* RESTORE LOG [Database99] FROM DISK = N'\\LSSecondary\

* Database99\Database99_20171022053000.trn' WITH FILE = 1, STANDBY =

* N'F:\Program Files\Microsoft SQL Server\MSSQL10_50.MSSQLSERVER\MSSQL\DA

* TA\Database99_20171022090018.tuf'

* MODULE BASE END SIZE

* sqlservr 0000000000A40000 00000000045EDFFF 03bae000

* ntdll 0000000077510000 00000000776B8FFF 001a9000

* kernel32 00000000772F0000 000000007740EFFF 0011f000

* KERNELBASE 000007FEFD560000 000007FEFD5CBFFF 0006c000

* ADVAPI32 000007FEFF730000 000007FEFF80AFFF 000db000

* msvcrt 000007FEFED70000 000007FEFEE0EFFF 0009f000

* sechost 000007FEFE5B0000 000007FEFE5CEFFF 0001f000

* RPCRT4 000007FEFE630000 000007FEFE75CFFF 0012d000

* MSVCR80 0000000074F50000 0000000075018FFF 000c9000

* MSVCP80 0000000074BD0000 0000000074CD8FFF 00109000

* sqlos 00000000745D0000 00000000745D6FFF 00007000

* Secur32 000007FEFCFE0000 000007FEFCFEAFFF 0000b000

* SSPICLI 000007FEFD260000 000007FEFD284FFF 00025000

* pdh 000007FEF6F00000 000007FEF6F4DFFF 0004e000

* SHLWAPI 000007FEFF680000 000007FEFF6F0FFF 00071000

* GDI32 000007FEFEC20000 000007FEFEC86FFF 00067000

* USER32 0000000077410000 0000000077509FFF 000fa000

* LPK 000007FEFF810000 000007FEFF81DFFF 0000e000

* USP10 000007FEFF190000 000007FEFF258FFF 000c9000

* USERENV 000007FEFC740000 000007FEFC75DFFF 0001e000

* profapi 000007FEFD3D0000 000007FEFD3DEFFF 0000f000

* WINMM 000007FEF6EC0000 000007FEF6EFAFFF 0003b000

* IPHLPAPI 000007FEFB670000 000007FEFB696FFF 00027000

* NSI 000007FEFE620000 000007FEFE627FFF 00008000

* WINNSI 000007FEFB660000 000007FEFB66AFFF 0000b000

* opends60 00000000745E0000 00000000745E7FFF 00008000

* NETAPI32 000007FEFAF00000 000007FEFAF15FFF 00016000

* netutils 000007FEFC970000 000007FEFC97BFFF 0000c000

* srvcli 000007FEFCF20000 000007FEFCF42FFF 00023000

* wkscli 000007FEFAEE0000 000007FEFAEF4FFF 00015000

* LOGONCLI 000007FEFCC80000 000007FEFCCAFFFF 00030000

* SAMCLI 000007FEFA310000 000007FEFA323FFF 00014000

* BatchParser 00000000743D0000 00000000743FCFFF 0002d000

* IMM32 000007FEFF700000 000007FEFF72DFFF 0002e000

* MSCTF 000007FEFEE10000 000007FEFEF18FFF 00109000

* psapi 00000000776D0000 00000000776D6FFF 00007000

* instapi10 0000000074AA0000 0000000074AACFFF 0000d000

* cscapi 000007FEF6EB0000 000007FEF6EBEFFF 0000f000

* sqlevn70 0000000071050000 0000000071250FFF 00201000

* ntmarta 000007FEFC530000 000007FEFC55CFFF 0002d000

* WLDAP32 000007FEFF130000 000007FEFF181FFF 00052000

* CRYPTSP 000007FEFCBE0000 000007FEFCBF7FFF 00018000

* rsaenh 000007FEFC9E0000 000007FEFCA26FFF 00047000

* CRYPTBASE 000007FEFD300000 000007FEFD30EFFF 0000f000

* BROWCLI 000007FEF6DE0000 000007FEF6DF1FFF 00012000

* AUTHZ 000007FEFCE70000 000007FEFCE9EFFF 0002f000

* MSCOREE 000007FEF9F90000 000007FEF9FFEFFF 0006f000

* mscoreei 000007FEF9EE0000 000007FEF9F77FFF 00098000

* ole32 000007FEFEF20000 000007FEFF122FFF 00203000

* credssp 000007FEFC900000 000007FEFC909FFF 0000a000

* msv1_0 000007FEFCB80000 000007FEFCBD1FFF 00052000

* cryptdll 000007FEFCF50000 000007FEFCF63FFF 00014000

* kerberos 000007FEFCCB0000 000007FEFCD67FFF 000b8000

* MSASN1 000007FEFD4B0000 000007FEFD4BEFFF 0000f000

* schannel 000007FEFC980000 000007FEFC9D6FFF 00057000

* CRYPT32 000007FEFD630000 000007FEFD79CFFF 0016d000

* security 0000000074210000 0000000074212FFF 00003000

* WS2_32 000007FEFE5D0000 000007FEFE61CFFF 0004d000

* SHELL32 000007FEFD820000 000007FEFE5A8FFF 00d89000

* OLEAUT32 000007FEFEC90000 000007FEFED66FFF 000d7000

* ftimport 0000000060000000 0000000060024FFF 00025000

* MSFTE 0000000049980000 0000000049D2DFFF 003ae000

* VERSION 000007FEFC520000 000007FEFC52BFFF 0000c000

* dbghelp 000000006F0C0000 000000006F21DFFF 0015e000

* WINTRUST 000007FEFD7A0000 000007FEFD7DAFFF 0003b000

* ncrypt 000007FEFCE20000 000007FEFCE6FFFF 00050000

* bcrypt 000007FEFCDF0000 000007FEFCE11FFF 00022000

* mswsock 000007FEFCC20000 000007FEFCC74FFF 00055000

* wship6 000007FEFCDE0000 000007FEFCDE6FFF 00007000

* wshtcpip 000007FEFC620000 000007FEFC626FFF 00007000

* ntdsapi 000007FEFB0A0000 000007FEFB0C6FFF 00027000

* DNSAPI 000007FEFCA30000 000007FEFCA8AFFF 0005b000

* rasadhlp 000007FEFAA70000 000007FEFAA77FFF 00008000

* fwpuclnt 000007FEFB560000 000007FEFB5B2FFF 00053000

* bcryptprimitives 000007FEFC7A0000 000007FEFC7EBFFF 0004c000

* CLBCatQ 000007FEFF560000 000007FEFF5F8FFF 00099000

* sqlncli10 0000000072690000 0000000072947FFF 002b8000

* COMCTL32 000007FEF6F50000 000007FEF6FEFFFF 000a0000

* COMDLG32 000007FEFF4C0000 000007FEFF556FFF 00097000

* SQLNCLIR10 0000000071260000 0000000071296FFF 00037000

* netbios 000007FEF4040000 000007FEF4049FFF 0000a000

* xpsqlbot 0000000074270000 0000000074277FFF 00008000

* xpstar 0000000065380000 0000000065407FFF 00088000

* SQLSCM 0000000074590000 000000007459DFFF 0000e000

* ODBC32 000007FEF2BA0000 000007FEF2C50FFF 000b1000

* ATL80 0000000074F00000 0000000074F1FFFF 00020000

* odbcint 00000000733D0000 0000000073407FFF 00038000

* clusapi 000007FEF6760000 000007FEF67AFFFF 00050000

* resutils 000007FEF6740000 000007FEF6758FFF 00019000

* xpstar 0000000074240000 0000000074264FFF 00025000

* xplog70 0000000074280000 000000007428FFFF 00010000

* xplog70 0000000074220000 0000000074221FFF 00002000

* dsrole 000007FEFBCD0000 000007FEFBCDBFFF 0000c000

* RpcRtRemote 000007FEFD3B0000 000007FEFD3C3FFF 00014000

* ssdebugps 000007FEEAE20000 000007FEEAE31FFF 00012000

* apphelp 000007FEFD2A0000 000007FEFD2F6FFF 00057000

* msxmlsql 00000000632C0000 0000000063410FFF 00151000

* dbghelp 000000006F8D0000 000000006FA2DFFF 0015e000

* P1Home: 0044004C00510053:

* P2Home: 00000000004A87B4: 0122002501210023 0124002900230027 0126002D0025002B 012800310127002F 012A003501290033 002C0039002B0037

* P3Home: 00000000004C8310: 000000002A010FF0 000000002A010FF0 0000000017407610 000000001AA86DD0 000000001736E3F0 000000002A027FD0

* P4Home: 0000000000000000:

* P5Home: 0052004500AE00AC:

* P6Home: 000000001742C658: 0050005C003A0046 00720067006F0072 00460020006D0061 00730065006C0069 00630069004D005C 006F0073006F0072

* ContextFlags: 000000000010000F: 0000000000000000 0000000000000000 0000000000000000 0000000000000000 0000000000000000 0000000000000000

* MxCsr: 0000000000001FA0:

* SegCs: 0000000000000033:

* SegDs: 000000000000002B:

* SegEs: 000000000000002B:

* SegFs: 0000000000000053:

* SegGs: 000000000000002B:

* SegSs: 000000000000002B:

* EFlags: 0000000000000202:

* Rax: 0000000000991AD3: 0000000000000000 0000000000000000 0000000000000000 0000000000000000 0000000000000000 0000000000000000

* Rcx: 000000002C568260: 0044004C00510053 00000000004A87B4 00000000004C8310 0000000000000000 0052004500AE00AC 000000001742C658

* Rdx: 0000000000000000:

* Rbx: 0000000000000000:

* Rsp: 000000002C568870: 0000000000000000 00000002E61A20F0 000000002C568AA0 000007FFFFEDC000 00000000000042AC 0000000000000000

* Rbp: 0000000000000000:

* Rsi: 00000002E61A20F0: 00000002E61A20F0 000000063039CEA0 0000000000000000 0000000000000004 00000002E61A2110 00000002E61A2110

* Rdi: 000000002C568AA0: 000000001742C650 FFFFFFFFFFFFFFFE 000007FFFFEDB4A8 0000000000000000 00000006699981A0 0000000000A426C3

* R8: 0000000000000000:

* R9: 0000000000000000:

* R10: 0000000000000000:

* R11: 000000002C569080: 0044004C00510053 00000000004A87B4 00000000004C8310 0000000000000000 0052004500AE00AC 000000001742C658

* R12: 0000000000000001:

* R13: 000000063039CEA0: 000000063039C080 000000063039C3B0 000000063039CAD0 00000002E61A2080 000000063039CF10 000000008089CE80

* R14: 000000002C569F60: 0064006E00610048 006E00410065006C 0074006F004E0064 0045006F00540065 0072006F00720072 003A0067006F006C

* R15: 000000000000003F:

* Rip: 000007FEFD56B3DD: C3000000C8C48148 003B830874F68548 0001BD1689660376 02D6840FFD3B0000 F3840F02FF830000 870F07FF83000002

* *******************************************************************************

* -------------------------------------------------------------------------------

* Short Stack Dump

000007FEFD56B3DD Module(KERNELBASE+000000000000B3DD)

0000000002AE981D Module(sqlservr+00000000020A981D)

0000000002AEDD3A Module(sqlservr+00000000020ADD3A)

0000000002B210C3 Module(sqlservr+00000000020E10C3)

0000000002B20F5D Module(sqlservr+00000000020E0F5D)

0000000000E3F5D8 Module(sqlservr+00000000003FF5D8)

0000000000E3F2A6 Module(sqlservr+00000000003FF2A6)

0000000002A47DBB Module(sqlservr+0000000002007DBB)

0000000002B24941 Module(sqlservr+00000000020E4941)

0000000002D14BFB Module(sqlservr+00000000022D4BFB)

0000000002CC0A44 Module(sqlservr+0000000002280A44)

0000000002CC0719 Module(sqlservr+0000000002280719)

0000000002CBFDC5 Module(sqlservr+000000000227FDC5)

0000000002D4C1B4 Module(sqlservr+000000000230C1B4)

0000000002D4B845 Module(sqlservr+000000000230B845)

0000000002CA321E Module(sqlservr+000000000226321E)

0000000002D4DD33 Module(sqlservr+000000000230DD33)

0000000002D52D0F Module(sqlservr+0000000002312D0F)

0000000002CC8CE6 Module(sqlservr+0000000002288CE6)

0000000001F288A1 Module(sqlservr+00000000014E88A1)

0000000000A99D59 Module(sqlservr+0000000000059D59)

0000000000A9A9B8 Module(sqlservr+000000000005A9B8)

0000000000A9B30C Module(sqlservr+000000000005B30C)

0000000000A9C1A6 Module(sqlservr+000000000005C1A6)

0000000000AE5342 Module(sqlservr+00000000000A5342)

0000000000A4BBD8 Module(sqlservr+000000000000BBD8)

0000000000A4B8BA Module(sqlservr+000000000000B8BA)

0000000000A4B6FF Module(sqlservr+000000000000B6FF)

0000000000F68FB6 Module(sqlservr+0000000000528FB6)

0000000000F69175 Module(sqlservr+0000000000529175)

0000000000F69839 Module(sqlservr+0000000000529839)

0000000000F69502 Module(sqlservr+0000000000529502)

0000000074F537D7 Module(MSVCR80+00000000000037D7)

0000000074F53894 Module(MSVCR80+0000000000003894)

00000000773059CD Module(kernel32+00000000000159CD)

000000007753B981 Module(ntdll+000000000002B981)

Stack Signature for the dump is 0x000000012832F01D

External dump process return code 0x20000001. External dump process returned no errors.

... followed by:

Error: 9003, Severity: 20, State: 6.

The log scan number (31185729:63630:107) passed to log scan in database 'Database99' is not valid. This error may indicate data corruption or that the log file (.ldf) does not match the data file (.mdf). If this error occurred during replication, re-create the publication. Otherwise, restore from backup if the problem results in a failure during startup.

https://thecatholicgeeks.files.wordpress.com/2015/05/beafraid.jpg

Pretty scary looking errors…I checked the database on OldServer01, especially since the CheckDB job hadn't run in several days (back to before the migration attempt), and thankfully it was clear:

Date and time: 2017-10-23 10:49:35

Server: OldServer01

Version: 10.50.4305.0

Edition: Enterprise Edition (64-bit)

Procedure: [master].[dbo].[DatabaseIntegrityCheck]

Parameters: @Databases = 'Database99', @CheckCommands = 'CHECKDB', @PhysicalOnly = 'N', @NoIndex = 'N', @ExtendedLogicalChecks = 'N', @TabLock = 'N'

, @FileGroups = NULL, @Objects = NULL, @LockTimeout = NULL, @LogToTable = 'N', @Execute = 'Y'

Source: https://ola.hallengren.com

Date and time: 2017-10-23 10:49:35

Database: [Database99]

Status: ONLINE

Standby: No

Updateability: READ_WRITE

User access: MULTI_USER

Is accessible: Yes

Recovery model: FULL

Date and time: 2017-10-23 10:49:35

Command: DBCC CHECKDB ([Database99]) WITH NO_INFOMSGS, ALL_ERRORMSGS, DATA_PURITY

Outcome: Succeeded

Duration: 00:35:44

Date and time: 2017-10-23 11:25:19

After some poking around and several rebuilds of log shipping (deleting the secondary, taking a new FULL on OldServer01, restoring it to LSSecondary, and then recreating the process) I figured out the issue was related to the first part of the first error:

“Could not redo log record (31185747:40079:1), for transaction ID (0:1129634848), on page (1:10522412), database 'Database99'”

The redo reader process was looking in the log shipping folder \\LSSecondary\Database99 and trying to restore the oldest TRN log backup files, which were related to the log shipping configuration from before the migration/fail-back over the weekend. Those older TRN files had log sequence numbers (LSNs) that were from before the FULL backup used to initialize the re-created log shipping, which was making the overall process error out and blowing up the process.

A colleague suggested that the log shipping restore should be skipping TRN files that are "too old" via LSN (which matched my understanding of how it "should" work) and that maybe this reflected one of the old TRN files having an invalid file header. After some consideration we realized that probably was *not* the case since the error message doesn't reflect not being able to read the backup file header, and if it couldn't read the header how would it have a log record number (31185747:40079:1) to include in the "could not redo" portion of the error?

I deleted the old backups from \\LSSecondary\Database99 and tried to run the process again, and received the same errors. When I looked in \\LSSecondary\Database99 I saw all of the old TRN files were back!

I had failed to consider (sigh) that the LSCopy job would bring all of the "old" files from the primary folder at \\OldServer01\Database99 back to \\LSSecondary\Database99, putting me back in the same situation.

Take 2 - I deleted all of the log shipping backups from *both* \\OldServer01\Database99 and \\LSSecondary\Database99 and then re-created the log shipping one last time…

https://memegenerator.net/img/instances/500x/27017972/success.jpg

This time the LS_Restore job functioned as expected, Database99 was in Standby/Read-Only on LSSecondary, and instead of my 100+ line stack dump I received some very respectable information messages in the SQL Error Log on LSSecondary:

Setting database option SINGLE_USER to ON for database Database99.

Starting up database 'Database99'.

The database 'Database99' is marked RESTORING and is in a state that does not allow recovery to be run.

Recovery is writing a checkpoint in database 'Database99' (5). This is an informational message only. No user action is required.

Recovery completed for database Database99 (database ID 5) in 2 second(s) (analysis 1080 ms, redo 0 ms, undo 466 ms.) This is an informational message only. No user action is required.

Starting up database 'Database99'.

CHECKDB for database 'Database99' finished without errors on 2017-10-23 10:49:44.400 (local time). This is an informational message only; no user action is required.

Log was restored. Database: Database99, creation date(time): 2017/10/22(02:20:41), first LSN: 31185760:34887:1, last LSN: 31185760:44042:1, number of dump devices: 1, device information: (FILE=1, TYPE=DISK: {'\\LSSecondary\Database99\Database99_20171023204500.trn'}). This is an informational message. No user action is required.

Setting database option MULTI_USER to ON for database Database99.

--

The SINGLE_USER and MULTI_USER messages are artifacts of telling the log shipping to disconnect users when restoring backups.

--

Hope this helps!

Wednesday, October 18, 2017

And YOU Get a Deadlock and YOU Get a Deadlock and EVERYBODY GETS A DEADLOCK!

https://memegenerator.net/img/instances/400x/59507872/you-get-a-deadlock-everybody-gets-a-deadlock.jpg

We all get them at one point or another - deadlocks.

I have Object 1 and need Object 2, you have Object 2 and need Object 1:

https://i-technet.sec.s-msft.com/dynimg/IC4289.gif

This isn't the same as simple blocking, where I have Object 1 and you want it (but you can't have it because it's MINE!) In that case you just need to wait your turn, and eventually you will get sick of waiting and walk away (timeout) or I will get done with it and surrender it to you.

In the deadlock case, there truly is no way out other than terminating one of the two requests - in the example above transaction 1 needs access to the Part table to proceed with its work on the Supplier table, and transaction 2 needs access to the Supplier table to work with the Part table. This means everyone is stuck.

A great (yet sad) non-techie way of describing a deadlock is this:

https://cdn.meme.am/instances/27106434/no-job-because-no-experience-no-experience-because-no-job.jpg

SQL Server has a process called the Deadlock Monitor that periodically watches for deadlock situations and handles them by terminating one or more processes to clear the logjam. As described in Technet:

By default, the Database Engine chooses as the deadlock victim the session running the transaction that is least expensive to roll back. Alternatively, a user can specify the priority of sessions in a deadlock situation using the SET DEADLOCK_PRIORITY statement. DEADLOCK_PRIORITY can be set to LOW, NORMAL, or HIGH, or alternatively can be set to any integer value in the range (-10 to 10). The deadlock priority defaults to NORMAL. If two sessions have different deadlock priorities, the session with the lower priority is chosen as the deadlock victim. If both sessions have the same deadlock priority, the session with the transaction that is least expensive to roll back is chosen. If sessions involved in the deadlock cycle have the same deadlock priority and the same cost, a victim is chosen randomly.

"Least Expensive to Roll Back" can sometimes be difficult to determine although a general guideline is that SELECT statements are the least expensive (with no LOG to roll back) and are frequent deadlock victims.

With all of that in hand, how do you gather information about deadlocks? Classical deadlock troubleshooting relies on trace flags 1204 and 1222 or a SQL Server trace on the Deadlock Chain and Deadlock Graph events (as described in this MSSQLTip).

https://www.toonpool.com/user/589/files/living_in_the_past_333905.jpg

We all know at this point (whether some people want to admit it or not) that Extended Events are the tool of the future for this kind of work. They are more lightweight than SQL Server Trace and are continuing to be improved (unlike Trace, which has been stagnant since SQL Server 2005!) Any time you find yourself using Trace or Profiler you should pause and reflect on how to accomplish the same task with XEvents.

In this case, there is deadlock information in the System_Health Session, the basic session deployed by default on all SQL Server instances since SQL Server 2008. (Think the better, stronger, faster version of the Default Trace.) Patrick Keisler (blog/@PatrickKeisler) has a great write-up about the System_Health session on his blog here.

By default deadlock information is one giant character field that is formatted for XML. The XML is readable (sort of) if that's your thing:

<deadlock>

<victim-list>

<victimProcess id="process5776a5498" />

</victim-list>

<process-list>

<process id="process5776a5498" taskpriority="0" logused="0" waitresource="KEY: 19:72057594040745984 (9ba325989765)" waittime="4168" ownerId="327954909" transactionname="SELECT" lasttranstarted="2017-10-17T17:21:04.417" XDES="0x50560f9d0" lockMode="S" schedulerid="1" kpid="3756" status="suspended" spid="64" sbid="0" ecid="0" priority="0" trancount="0" lastbatchstarted="2017-10-17T17:21:04.417" lastbatchcompleted="2017-10-17T17:21:03.293" lastattention="1900-01-01T00:00:00.293" hostpid="4012" loginname="Login1" isolationlevel="read committed (2)" xactid="327954909" currentdb="19" lockTimeout="4294967295" clientoption1="671088672" clientoption2="128056">

<executionStack>

<frame procname="adhoc" line="1" sqlhandle="0x0200000096663f042d323a9d19c06335abb6318e18e6f8750000000000000000000000000000000000000000">

SELECT dbo.Fred_Table7.ObjectID FROM dbo.Fred_Table7 WHERE (dbo.Fred_Table7.ObjectID not in (23) AND dbo.Fred_Table7.ObjectID <= 4294966615 AND (dbo.Fred_Table7.LastModifyTime >CAST('2017 10 17 20 08 18 115' AS VARBINARY(32)) OR dbo.Fred_Table7.LastModifyTime =CAST('2017 10 17 20 08 18 115' AS VARBINARY(32)) AND dbo.Fred_Table7.ObjectID > 2404307) AND (dbo.Fred_Table7.SI_INSTANCE_OBJECT = 0 AND dbo.Fred_Table7.SI_RUNNABLE_OBJECT = 0 OR (dbo.Fred_Table7.SI_INSTANCE_OBJECT = 1 OR dbo.Fred_Table7.SI_RUNNABLE_OBJECT = 1) AND dbo.Fred_Table7.SI_RUNNABLE_OBJECT = 0 AND dbo.Fred_Table7.ScheduleStatus = 1) AND dbo.Fred_Table7.TypeID in (2, 8, 21, 67, 68, 260, 261, 262, 266, 267, 270, 271, 272, 277, 279, 281, 282, 289, 295, 301, 337, 338, 339, 340, 343, 349, 351, 355, 358, 361, 365, 368, 372, 373, 379, 385, 387, 389, 391, 394, 396, 407, 408, 414, 415, 430, 431) AND (SI_PLUGIN_OBJECT = 0) AND dbo.Fred_Table7.SI_HIDDEN_OBJECT = 0) ORDER BY dbo </frame>

</executionStack>

<inputbuf>

SELECT dbo.Fred_Table7.ObjectID FROM dbo.Fred_Table7 WHERE (dbo.Fred_Table7.ObjectID not in (23) AND dbo.Fred_Table7.ObjectID <= 4294966615 AND (dbo.Fred_Table7.LastModifyTime >CAST('2017 10 17 20 08 18 115' AS VARBINARY(32)) OR dbo.Fred_Table7.LastModifyTime =CAST('2017 10 17 20 08 18 115' AS VARBINARY(32)) AND dbo.Fred_Table7.ObjectID > 2404307) AND (dbo.Fred_Table7.SI_INSTANCE_OBJECT = 0 AND dbo.Fred_Table7.SI_RUNNABLE_OBJECT = 0 OR (dbo.Fred_Table7.SI_INSTANCE_OBJECT = 1 OR dbo.Fred_Table7.SI_RUNNABLE_OBJECT = 1) AND dbo.Fred_Table7.SI_RUNNABLE_OBJECT = 0 AND dbo.Fred_Table7.ScheduleStatus = 1) AND dbo.Fred_Table7.TypeID in (2, 8, 21, 67, 68, 260, 261, 262, 266, 267, 270, 271, 272, 277, 279, 281, 282, 289, 295, 301, 337, 338, 339, 340, 343, 349, 351, 355, 358, 361, 365, 368, 372, 373, 379, 385, 387, 389, 391, 394, 396, 407, 408, 414, 415, 430, 431) AND (SI_PLUGIN_OBJECT = 0) AND dbo.Fred_Table7.SI_HIDDEN_OBJECT = 0) ORDER BY db </inputbuf>

</process>

<process id="process1b9313868" taskpriority="0" logused="284" waitresource="KEY: 19:72057594041008128 (1679744d2988)" waittime="4168" ownerId="327954910" transactionname="implicit_transaction" lasttranstarted="2017-10-17T17:21:04.417" XDES="0x57788c3a8" lockMode="X" schedulerid="2" kpid="11128" status="suspended" spid="60" sbid="0" ecid="0" priority="0" trancount="2" lastbatchstarted="2017-10-17T17:21:04.417" lastbatchcompleted="2017-10-17T17:21:04.417" lastattention="1900-01-01T00:00:00.417" hostpid="4012" loginname="Login1" isolationlevel="read committed (2)" xactid="327954910" currentdb="19" lockTimeout="4294967295" clientoption1="671088672" clientoption2="128058">

<executionStack>

<frame procname="adhoc" line="1" stmtstart="86" sqlhandle="0x0200000011d6ca3a9ba3506e8d74888b6554043825c5204e0000000000000000000000000000000000000000">

UPDATE dbo.Fred_Table7 SET Version = @P1, LastModifyTime = @P2 WHERE ObjectID = @P3 AND Version = @P4 </frame>

<frame procname="unknown" line="1" sqlhandle="0x0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000">

unknown </frame>

</executionStack>

<inputbuf>

(@P1 int,@P2 varbinary(32),@P3 int,@P4 int)UPDATE dbo.Fred_Table7 SET Version = @P1, LastModifyTime = @P2 WHERE ObjectID = @P3 AND Version = @P4 </inputbuf>

</process>

</process-list>

<resource-list>

<keylock hobtid="72057594040745984" dbid="19" objectname="Database12.dbo.Fred_Table7" indexname="ObjectID_I7" id="lock52e379a80" mode="X" associatedObjectId="72057594040745984">

<owner-list>

<owner id="process1b9313868" mode="X" />

</owner-list>

<waiter-list>

<waiter id="process5776a5498" mode="S" requestType="wait" />

</waiter-list>

</keylock>

<keylock hobtid="72057594041008128" dbid="19" objectname="Database12.dbo.Fred_Table7" indexname="LastModifyTime_I7" id="lock35b41fa80" mode="S" associatedObjectId="72057594041008128">

<owner-list>

<owner id="process5776a5498" mode="S" />

</owner-list>

<waiter-list>

<waiter id="process1b9313868" mode="X" requestType="wait" />

</waiter-list>

</keylock>

</resource-list>

</deadlock>

You can piece through it to find the <inputbuf> tag to find the queries on the Fred_Table7 that are involved, and find the loginname of Login1, but is this really how you want to consume this data?

https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEgK4Lx8DWDZxjWZ9Jbqsovb7iMmL5LpfNuCcOoqdqXDHVHCAZ8vwQPRVTzjnobfpakkhY1nFdXNFysU9LTd_LZWfRAacYVSRw7jIQ3rwWh1If2FOfNGZGtbjTs2ZTO4QlxIhZ-xQE0mKcOJ/s1600/HaHaNo.jpg

A much easier way to deal with this information is in the Deadlock Graph.

The Deadlock Graph turns the XML information show above into, well, a graph. You can see the two transactions as the circles on the outside and the two objects involved in the rectangles in the center of the diagram. Sometimes there are more than two transactions involved in a deadlock (beyond the scope of this post) but if there are more than two they will show up in the graph as well. When you open the graph in SQL Server Management Studio you can hover over the transaction circle and see the underlying query and some other transaction data.

As mentioned above there is a trace event to return the Deadlock Graph, and you can draw it via XEvents as well with a query.

NOTE - there was a change in the way deadlock information is processed in XEvents between SQL Server 2008R2 and SQL Server 2012, so there are different scripts depending on version:

/*

SQL Server Deadlock Graph Extraction from Ring Buffer

SQL Server 2008/2008R2

Modified from https://www.sqlservercentral.com/Forums/1399315/System-health-extended-event-session-does-not-capture-latest-deadlocks?PageIndex=1

*/

;WITH SystemHealth

AS (

SELECT

CAST ( target_data AS xml ) AS SessionXML

FROM

sys.dm_xe_session_targets st

INNER JOIN

sys.dm_xe_sessions s

ON

s.[address] = st.event_session_address

WHERE

name = 'system_health'

)

SELECT

Deadlock.value ( '@timestamp', 'datetime' ) AS DeadlockDateTime

, CAST ( Deadlock.value ( '(data/value)[1]', 'Nvarchar(max)' ) AS XML ) AS DeadlockGraph

FROM

SystemHealth s

CROSS APPLY

SessionXML.nodes ( '//RingBufferTarget/event' ) AS t (Deadlock)

WHERE

Deadlock.value ( '@name', 'nvarchar(128)' ) = N'xml_deadlock_report'

ORDER BY

Deadlock.value ( '@timestamp', 'datetime' );

/*

SQL Server Deadlock Graph Extraction from Ring Buffer

SQL Server 2012+

Modified from https://www.red-gate.com/simple-talk/sql/database-administration/handling-deadlocks-in-sql-server/

*/

SELECT XEvent.query('(event/data/value/deadlock)[1]') AS DeadlockGraph

FROM ( SELECT XEvent.query('.') AS XEvent

FROM ( SELECT CAST(target_data AS XML) AS TargetData

FROM sys.dm_xe_session_targets st

JOIN sys.dm_xe_sessions s

ON s.address = st.event_session_address

WHERE s.name = 'system_health'

AND st.target_name = 'ring_buffer'

) AS Data

CROSS APPLY TargetData.nodes

('RingBufferTarget/event[@name="xml_deadlock_report"]')

AS XEventData ( XEvent )

) AS src;

Another awesome change in SQL 2012+ is that the System_Health session now writes to a file target as well as the ring buffer memory target. This is cool because the files persist more data than can be maintained in the ring buffer, giving you more history. Retrieving data from the XEL files requires a different query, which is set up to dynamically read the XEL files from your default LOG path, wherever it may be:

/*

SQL Server Deadlock Graph Extraction from XEL file target

SQL Server 2012+

http://www.sqlservercentral.com/blogs/sql-geek/2017/10/07/extracting-deadlock-information-using-system_health-extended-events/

*/

CREATE TABLE #errorlog (LogDate DATETIME , ProcessInfo VARCHAR(100), [Text] VARCHAR(MAX))

DECLARE @tag VARCHAR (MAX) , @path VARCHAR(MAX);

INSERT INTO #errorlog EXEC sp_readerrorlog;

SELECT @tag = text

FROM #errorlog

WHERE [Text] LIKE 'Logging%MSSQL\Log%';

DROP TABLE #errorlog;

SET @path = SUBSTRING(@tag, 38, CHARINDEX('MSSQL\Log', @tag) - 29);

SELECT

CONVERT(xml, event_data).query('/event/data/value/child::*') AS DeadlockReport,

CONVERT(xml, event_data).value('(event[@name="xml_deadlock_report"]/@timestamp)[1]', 'datetime')

AS Execution_Time

FROM sys.fn_xe_file_target_read_file(@path + '\system_health*.xel', NULL, NULL, NULL)

WHERE OBJECT_NAME like 'xml_deadlock_report';

As an example, here is the output of the ring buffer query versus the file target (XEL) query on a given server:

The file target has six times as much history as the ring buffer (in this case) - you should "always" see the same or more data in the XEL files than the ring buffer.

All of these scripts return an XML output like the XML shown above - it doesn't return a graph directly.

https://memegenerator.net/img/instances/71836370/say-whaaat.jpg

To get the graph is pretty straightforward - once you know how. Open the XML document, either by clicking the link in the resultset or by copy-pasting the XML into the text editor of your choice:

Once the XML document is open, either in Management Studio or in your text editor, simply save the document as extension .XDL rather than .XML The new file will have an icon like this:

If you open this file in Management Studio (via double-click or Open>File in Management Studio) you will be presented with the actual Deadlock Graph as seen above, complete with ability to hover to see transaction information, etc.

...unless you are using SQL Server 2008/2008R2 Management Studio, in which case you will see this:

Argghhhhh!

As described by Extended Events guru Jonathan Kehayias (blog/@sqlpoolboy) in a blog post here, this is an artifact of SQL 2008 Management Studio not being ready to render the deadlock output from XEvents (as opposed to the deadlock output from Trace). As Jonathan notes there are two ways to handle this - the first way is to use an up-level Management Studio of at least version 2012, either by upgrading the local copy of Management Studio (you can download the current Management Studio that is still compatible back to 2008 from Microsoft here) or by copying the XDL file to another server with a more recent Management Studio. The other way to open the file under SQL 2008/2008R2 is to use SentryOne's free tool Plan Explorer Pro. I usually don't pitch company products but this is a really useful 100% free tool.

The most common answer will be the first - copy the XDL to a server with 2012+ Management Studio, and all will be well.

--

UPDATE - a friend pointed out that I had forgotten a script that does a great job shredding the Deadlock information from files, the ring buffer, or SentryOne's data collection into a useful format. It was created by MCM/MVP Wayne Sheffield (blog/@DBAWayne) and is located here.

Hope this helps!

Friday, December 8, 2017

Come to SQL Saturday Providence!

Does it Hurt When I Do This? Performing a SQL Server Health Check

Wednesday, November 1, 2017

PASS Summit 2017 - Live and in Living Color!

Monday, October 23, 2017

But I Don't *Want* To "Restore from a backup of the database or repair the database!"

Wednesday, October 18, 2017

And YOU Get a Deadlock and YOU Get a Deadlock and EVERYBODY GETS A DEADLOCK!

But I Don't Want To "Restore from a backup of the database or repair the database!"