One DBA's Ongoing Search for Clarity in the Middle of Nowhere


*or*

Yet Another Andy Writing About SQL Server

Friday, December 8, 2017

Come to SQL Saturday Providence!

Tomorrow 12/09/2017 is SQL Saturday #694 in Providence, RI.

I will be presenting by SQL Server Health Check session first thing in the morning:

--

Does it Hurt When I Do This? Performing a SQL Server Health Check

Speaker: Andy Galbraith
Duration: 60 minutes
Track: Enterprise Database Administration & Deployment
How often do you review your SQL Servers for basic security, maintenance, and performance issues?  Many of the servers I "inherit" as a managed services provider have quite a few gaping holes. It is not unusual to find databases that are never backed up, servers with constant login failures (is it an attack or a bad connection string?), and servers that need more RAM/CPU/etc. (or sometimes that even have too much!) 

Come learn how to use freely available tools from multiple layers of the SQL Server stack to check your servers for basic issues like missing backups and CheckDB as well as for more advanced issues like page life expectancy problems and improper indexing. If you are responsible in any way for a Microsoft SQL Server (DBA, Windows Admin, even a Developer) you will see value in this session!
--

In total there are six tracks with five individual sessions each, so a full day of content!

Registration is still open at https://www.sqlsaturday.com/694/RegisterNow.aspx - come to Bryant University tomorrow and check it out!





Wednesday, November 1, 2017

PASS Summit 2017 - Live and in Living Color!

UPDATE - the feed and schedule are live at http://www.pass.org/summit/2017/PASStv.aspx

http://rochistory.com/blog/wp-content/uploads/2014/09/NBC-logo.png
Not in Seattle for some reason this week?  Like maybe you stayed home because it was Halloween week, or you don't have the funds, or you are preparing for the zombie apocalypse?

http://www.demotivation.us/media/demotivators/demotivation.us_Zombie-Apocalypse-Preparation-level-genius_133338427933.jpg

PASS is ready to help you out - starting today (Wednesday 11/01/2017) at 8:10 PDT (15:10 UTC) PASS will stream the keynotes, some sessions, and certain other round tables and events LIVE online at http://www.pass.org/summit/2017/Home.aspx

The schedule for the next few days isn't up yet but will be soon - tune in!

https://radiosoundsfamiliar.com/resources/Rebrand/kep%20calm%20and%20tune%20in%20logo.jpg.opt197x337o0%2C0s197x337.jpg

Monday, October 23, 2017

But I Don't *Want* To "Restore from a backup of the database or repair the database!"

Do you like Log Shipping?

Before my time at Ntirety I had done quite a bit of work with failover clustering and mirroring (trending into some Availability Groups) but I had never really done that much replication or log shipping.  Even at my previous managed services remote DBA job I simply hadn't run across it...

I have said in this blog more than once that education (training and real-world) is one of the most important parts of being a DBA, and that we should always be learning; my time at Ntirety has definitely educated me about replication and log shipping!

https://memegenerator.net/img/instances/500x/42421555/be-careful-what-you-wish-for-punk.jpg
This Log Shipping story (maybe its a cautionary tale - so many of these stories are) begins with a failed server migration.  We were trying to replace a server with a new VM and due to some issues completely unrelated to log shipping, we ended up reverting to the original server (maybe once we figure that out I will write about it too).  

The problem was that I had already dropped the log shipping from OldServer01.Database99 to LSSecondary.Database99 and created new log shipping from NewServer01.Database99 to LSSecondary.Database99.  Part of the backout plan was dropping *that* log shipping and recreating the original relationship between OldServer01 and LSSecondary.

I had the creation process scripted out, so I ran the scripts, made sure they succeeded, checked that LSSecondary.Database99 reflected the current data (we had it in Standyby/Read-Only) and then walked away from log shipping to continue the rest of the revert process.

http://vignette1.wikia.nocookie.net/cardfight/images/e/e1/Funniest_Memes_you-my-friend-just-made-a-big-mistake_10940.jpeg/revision/latest?cb=20150805104725

Monday morning rolled around and one of the business users reported that Database99 on LSSecondary was in Restoring rather than Standby, and wanted me to check it out.  Sure enough, Database99 was in Restoring even though the LSRestore job wasn't running and there was no active RESTORE process running.

OK....

I check the properties of the log shipping secondary in the Management Studio GUI and they looked as expected:




You can see these same settings via T-SQL with the following query:
SELECT secondary_database
, restore_mode
, disconnect_users
, last_restored_file
FROM msdb.dbo.log_shipping_secondary_databases 

Since the Database shows restore_mode 1 (Standby) and disconnect_users 1 (true) there is no reason by the configuration that the database should be in RESTORING.

The next stop was the SQL Server Error Log.  This probably should have been my first stop, but since I had just set this log shipping up I thought of the configuration first.  The Error Log was a horror show:

--

Error: 3456, Severity: 16, State: 1.
Could not redo log record (31185729:63924:22), for transaction ID (0:0), on page (1:511238), database 'Database99' (database ID 5). Page: LSN = (31185729:63857:14), type = 16. Log: OpCode = 7, context 24, PrevPageLSN: (31185729:63804:511). Restore from a backup of the database, or repair the database

--

…followed by 100+ lines of Error Stack Dump <cue scary music>:

--

Using 'dbghelp.dll' version '4.0.5'
**Dump thread - spid = 0, EC = 0x00000002E61A20F0
***Stack Dump being sent to F:\Program Files\Microsoft SQL Server\MSSQL10_50.MSSQLSERVER\MSSQL\LOG\SQLDump0001.txt
* *******************************************************************************
*
* BEGIN STACK DUMP:
*   10/22/17 03:00:34 spid 55
*
* HandleAndNoteToErrorlog: Exception raised, major=34, minor=56, severity=16
*
* Input Buffer 506 bytes -
*             RESTORE LOG [Database99] FROM DISK = N'\\LSSecondary\
*  Database99\Database99_20171022053000.trn' WITH FILE = 1, STANDBY =
*   N'F:\Program Files\Microsoft SQL Server\MSSQL10_50.MSSQLSERVER\MSSQL\DA
*  TA\Database99_20171022090018.tuf'
*  
*
*  MODULE                          BASE      END       SIZE
* sqlservr                       0000000000A40000  00000000045EDFFF  03bae000
* ntdll                          0000000077510000  00000000776B8FFF  001a9000
* kernel32                       00000000772F0000  000000007740EFFF  0011f000
* KERNELBASE                     000007FEFD560000  000007FEFD5CBFFF  0006c000
* ADVAPI32                       000007FEFF730000  000007FEFF80AFFF  000db000
* msvcrt                         000007FEFED70000  000007FEFEE0EFFF  0009f000
* sechost                        000007FEFE5B0000  000007FEFE5CEFFF  0001f000
* RPCRT4                         000007FEFE630000  000007FEFE75CFFF  0012d000
* MSVCR80                        0000000074F50000  0000000075018FFF  000c9000
* MSVCP80                        0000000074BD0000  0000000074CD8FFF  00109000
* sqlos                          00000000745D0000  00000000745D6FFF  00007000
* Secur32                        000007FEFCFE0000  000007FEFCFEAFFF  0000b000
* SSPICLI                        000007FEFD260000  000007FEFD284FFF  00025000
* pdh                            000007FEF6F00000  000007FEF6F4DFFF  0004e000
* SHLWAPI                        000007FEFF680000  000007FEFF6F0FFF  00071000
* GDI32                          000007FEFEC20000  000007FEFEC86FFF  00067000
* USER32                         0000000077410000  0000000077509FFF  000fa000
* LPK                            000007FEFF810000  000007FEFF81DFFF  0000e000
* USP10                          000007FEFF190000  000007FEFF258FFF  000c9000
* USERENV                        000007FEFC740000  000007FEFC75DFFF  0001e000
* profapi                        000007FEFD3D0000  000007FEFD3DEFFF  0000f000
* WINMM                          000007FEF6EC0000  000007FEF6EFAFFF  0003b000
* IPHLPAPI                       000007FEFB670000  000007FEFB696FFF  00027000
* NSI                            000007FEFE620000  000007FEFE627FFF  00008000
* WINNSI                         000007FEFB660000  000007FEFB66AFFF  0000b000
* opends60                       00000000745E0000  00000000745E7FFF  00008000
* NETAPI32                       000007FEFAF00000  000007FEFAF15FFF  00016000
* netutils                       000007FEFC970000  000007FEFC97BFFF  0000c000
* srvcli                         000007FEFCF20000  000007FEFCF42FFF  00023000
* wkscli                         000007FEFAEE0000  000007FEFAEF4FFF  00015000
* LOGONCLI                       000007FEFCC80000  000007FEFCCAFFFF  00030000
* SAMCLI                         000007FEFA310000  000007FEFA323FFF  00014000
* BatchParser                    00000000743D0000  00000000743FCFFF  0002d000
* IMM32                          000007FEFF700000  000007FEFF72DFFF  0002e000
* MSCTF                          000007FEFEE10000  000007FEFEF18FFF  00109000
* psapi                          00000000776D0000  00000000776D6FFF  00007000
* instapi10                      0000000074AA0000  0000000074AACFFF  0000d000
* cscapi                         000007FEF6EB0000  000007FEF6EBEFFF  0000f000
* sqlevn70                       0000000071050000  0000000071250FFF  00201000
* ntmarta                        000007FEFC530000  000007FEFC55CFFF  0002d000
* WLDAP32                        000007FEFF130000  000007FEFF181FFF  00052000
* CRYPTSP                        000007FEFCBE0000  000007FEFCBF7FFF  00018000
* rsaenh                         000007FEFC9E0000  000007FEFCA26FFF  00047000
* CRYPTBASE                      000007FEFD300000  000007FEFD30EFFF  0000f000
* BROWCLI                        000007FEF6DE0000  000007FEF6DF1FFF  00012000
* AUTHZ                          000007FEFCE70000  000007FEFCE9EFFF  0002f000
* MSCOREE                        000007FEF9F90000  000007FEF9FFEFFF  0006f000
* mscoreei                       000007FEF9EE0000  000007FEF9F77FFF  00098000
* ole32                          000007FEFEF20000  000007FEFF122FFF  00203000
* credssp                        000007FEFC900000  000007FEFC909FFF  0000a000
* msv1_0                         000007FEFCB80000  000007FEFCBD1FFF  00052000
* cryptdll                       000007FEFCF50000  000007FEFCF63FFF  00014000
* kerberos                       000007FEFCCB0000  000007FEFCD67FFF  000b8000
* MSASN1                         000007FEFD4B0000  000007FEFD4BEFFF  0000f000
* schannel                       000007FEFC980000  000007FEFC9D6FFF  00057000
* CRYPT32                        000007FEFD630000  000007FEFD79CFFF  0016d000
* security                       0000000074210000  0000000074212FFF  00003000
* WS2_32                         000007FEFE5D0000  000007FEFE61CFFF  0004d000
* SHELL32                        000007FEFD820000  000007FEFE5A8FFF  00d89000
* OLEAUT32                       000007FEFEC90000  000007FEFED66FFF  000d7000
* ftimport                       0000000060000000  0000000060024FFF  00025000
* MSFTE                          0000000049980000  0000000049D2DFFF  003ae000
* VERSION                        000007FEFC520000  000007FEFC52BFFF  0000c000
* dbghelp                        000000006F0C0000  000000006F21DFFF  0015e000
* WINTRUST                       000007FEFD7A0000  000007FEFD7DAFFF  0003b000
* ncrypt                         000007FEFCE20000  000007FEFCE6FFFF  00050000
* bcrypt                         000007FEFCDF0000  000007FEFCE11FFF  00022000
* mswsock                        000007FEFCC20000  000007FEFCC74FFF  00055000
* wship6                         000007FEFCDE0000  000007FEFCDE6FFF  00007000
* wshtcpip                       000007FEFC620000  000007FEFC626FFF  00007000
* ntdsapi                        000007FEFB0A0000  000007FEFB0C6FFF  00027000
* DNSAPI                         000007FEFCA30000  000007FEFCA8AFFF  0005b000
* rasadhlp                       000007FEFAA70000  000007FEFAA77FFF  00008000
* fwpuclnt                       000007FEFB560000  000007FEFB5B2FFF  00053000
* bcryptprimitives               000007FEFC7A0000  000007FEFC7EBFFF  0004c000
* CLBCatQ                        000007FEFF560000  000007FEFF5F8FFF  00099000
* sqlncli10                      0000000072690000  0000000072947FFF  002b8000
* COMCTL32                       000007FEF6F50000  000007FEF6FEFFFF  000a0000
* COMDLG32                       000007FEFF4C0000  000007FEFF556FFF  00097000
* SQLNCLIR10                     0000000071260000  0000000071296FFF  00037000
* netbios                        000007FEF4040000  000007FEF4049FFF  0000a000
* xpsqlbot                       0000000074270000  0000000074277FFF  00008000
* xpstar                         0000000065380000  0000000065407FFF  00088000
* SQLSCM                         0000000074590000  000000007459DFFF  0000e000
* ODBC32                         000007FEF2BA0000  000007FEF2C50FFF  000b1000
* ATL80                          0000000074F00000  0000000074F1FFFF  00020000
* odbcint                        00000000733D0000  0000000073407FFF  00038000
* clusapi                        000007FEF6760000  000007FEF67AFFFF  00050000
* resutils                       000007FEF6740000  000007FEF6758FFF  00019000
* xpstar                         0000000074240000  0000000074264FFF  00025000
* xplog70                        0000000074280000  000000007428FFFF  00010000
* xplog70                        0000000074220000  0000000074221FFF  00002000
* dsrole                         000007FEFBCD0000  000007FEFBCDBFFF  0000c000
* RpcRtRemote                    000007FEFD3B0000  000007FEFD3C3FFF  00014000
* ssdebugps                      000007FEEAE20000  000007FEEAE31FFF  00012000
* apphelp                        000007FEFD2A0000  000007FEFD2F6FFF  00057000
* msxmlsql                       00000000632C0000  0000000063410FFF  00151000
* dbghelp                        000000006F8D0000  000000006FA2DFFF  0015e000
*
*     P1Home: 0044004C00510053:  
*     P2Home: 00000000004A87B4:  0122002501210023  0124002900230027  0126002D0025002B  012800310127002F  012A003501290033  002C0039002B0037  
*     P3Home: 00000000004C8310:  000000002A010FF0  000000002A010FF0  0000000017407610  000000001AA86DD0  000000001736E3F0  000000002A027FD0  
*     P4Home: 0000000000000000:  
*     P5Home: 0052004500AE00AC:  
*     P6Home: 000000001742C658:  0050005C003A0046  00720067006F0072  00460020006D0061  00730065006C0069  00630069004D005C  006F0073006F0072  
* ContextFlags: 000000000010000F:  0000000000000000  0000000000000000  0000000000000000  0000000000000000  0000000000000000  0000000000000000  
*      MxCsr: 0000000000001FA0:  
*      SegCs: 0000000000000033:  
*      SegDs: 000000000000002B:  
*      SegEs: 000000000000002B:  
*      SegFs: 0000000000000053:  
*      SegGs: 000000000000002B:  
*      SegSs: 000000000000002B:  
*     EFlags: 0000000000000202:  
*        Rax: 0000000000991AD3:  0000000000000000  0000000000000000  0000000000000000  0000000000000000  0000000000000000  0000000000000000  
*        Rcx: 000000002C568260:  0044004C00510053  00000000004A87B4  00000000004C8310  0000000000000000  0052004500AE00AC  000000001742C658  
*        Rdx: 0000000000000000:  
*        Rbx: 0000000000000000:  
*        Rsp: 000000002C568870:  0000000000000000  00000002E61A20F0  000000002C568AA0  000007FFFFEDC000  00000000000042AC  0000000000000000  
*        Rbp: 0000000000000000:  
*        Rsi: 00000002E61A20F0:  00000002E61A20F0  000000063039CEA0  0000000000000000  0000000000000004  00000002E61A2110  00000002E61A2110  
*        Rdi: 000000002C568AA0:  000000001742C650  FFFFFFFFFFFFFFFE  000007FFFFEDB4A8  0000000000000000  00000006699981A0  0000000000A426C3  
*         R8: 0000000000000000:  
*         R9: 0000000000000000:  
*        R10: 0000000000000000:  
*        R11: 000000002C569080:  0044004C00510053  00000000004A87B4  00000000004C8310  0000000000000000  0052004500AE00AC  000000001742C658  
*        R12: 0000000000000001:  
*        R13: 000000063039CEA0:  000000063039C080  000000063039C3B0  000000063039CAD0  00000002E61A2080  000000063039CF10  000000008089CE80  
*        R14: 000000002C569F60:  0064006E00610048  006E00410065006C  0074006F004E0064  0045006F00540065  0072006F00720072  003A0067006F006C  
*        R15: 000000000000003F:  
*        Rip: 000007FEFD56B3DD:  C3000000C8C48148  003B830874F68548  0001BD1689660376  02D6840FFD3B0000  F3840F02FF830000  870F07FF83000002  
* *******************************************************************************
* -------------------------------------------------------------------------------
* Short Stack Dump
000007FEFD56B3DD Module(KERNELBASE+000000000000B3DD)
0000000002AE981D Module(sqlservr+00000000020A981D)
0000000002AEDD3A Module(sqlservr+00000000020ADD3A)
0000000002B210C3 Module(sqlservr+00000000020E10C3)
0000000002B20F5D Module(sqlservr+00000000020E0F5D)
0000000000E3F5D8 Module(sqlservr+00000000003FF5D8)
0000000000E3F2A6 Module(sqlservr+00000000003FF2A6)
0000000002A47DBB Module(sqlservr+0000000002007DBB)
0000000002B24941 Module(sqlservr+00000000020E4941)
0000000002D14BFB Module(sqlservr+00000000022D4BFB)
0000000002CC0A44 Module(sqlservr+0000000002280A44)
0000000002CC0719 Module(sqlservr+0000000002280719)
0000000002CBFDC5 Module(sqlservr+000000000227FDC5)
0000000002D4C1B4 Module(sqlservr+000000000230C1B4)
0000000002D4B845 Module(sqlservr+000000000230B845)
0000000002CA321E Module(sqlservr+000000000226321E)
0000000002D4DD33 Module(sqlservr+000000000230DD33)
0000000002D52D0F Module(sqlservr+0000000002312D0F)
0000000002CC8CE6 Module(sqlservr+0000000002288CE6)
0000000001F288A1 Module(sqlservr+00000000014E88A1)
0000000000A99D59 Module(sqlservr+0000000000059D59)
0000000000A9A9B8 Module(sqlservr+000000000005A9B8)
0000000000A9B30C Module(sqlservr+000000000005B30C)
0000000000A9C1A6 Module(sqlservr+000000000005C1A6)
0000000000AE5342 Module(sqlservr+00000000000A5342)
0000000000A4BBD8 Module(sqlservr+000000000000BBD8)
0000000000A4B8BA Module(sqlservr+000000000000B8BA)
0000000000A4B6FF Module(sqlservr+000000000000B6FF)
0000000000F68FB6 Module(sqlservr+0000000000528FB6)
0000000000F69175 Module(sqlservr+0000000000529175)
0000000000F69839 Module(sqlservr+0000000000529839)
0000000000F69502 Module(sqlservr+0000000000529502)
0000000074F537D7 Module(MSVCR80+00000000000037D7)
0000000074F53894 Module(MSVCR80+0000000000003894)
00000000773059CD Module(kernel32+00000000000159CD)
000000007753B981 Module(ntdll+000000000002B981)
Stack Signature for the dump is 0x000000012832F01D
External dump process return code 0x20000001.  External dump process returned no errors. 

--

... followed by:

--

Error: 9003, Severity: 20, State: 6.
The log scan number (31185729:63630:107) passed to log scan in database 'Database99' is not valid. This error may indicate data corruption or that the log file (.ldf) does not match the data file (.mdf). If this error occurred during replication, re-create the publication. Otherwise, restore from backup if the problem results in a failure during startup. 
--

https://thecatholicgeeks.files.wordpress.com/2015/05/beafraid.jpg

Pretty scary looking errors…I checked the database on OldServer01, especially since the CheckDB job hadn't run in several days (back to before the migration attempt), and thankfully it was clear:

--

Date and time: 2017-10-23 10:49:35
Server: OldServer01
Version: 10.50.4305.0
Edition: Enterprise Edition (64-bit)
Procedure: [master].[dbo].[DatabaseIntegrityCheck]
Parameters: @Databases = 'Database99', @CheckCommands = 'CHECKDB', @PhysicalOnly = 'N', @NoIndex = 'N', @ExtendedLogicalChecks = 'N', @TabLock = 'N'
, @FileGroups = NULL, @Objects = NULL, @LockTimeout = NULL, @LogToTable = 'N', @Execute = 'Y'
Date and time: 2017-10-23 10:49:35
Database: [Database99]
Status: ONLINE
Standby: No
Updateability: READ_WRITE
User access: MULTI_USER
Is accessible: Yes
Recovery model: FULL
Date and time: 2017-10-23 10:49:35
Command: DBCC CHECKDB ([Database99]) WITH NO_INFOMSGS, ALL_ERRORMSGS, DATA_PURITY
Outcome: Succeeded
Duration: 00:35:44
Date and time: 2017-10-23 11:25:19


--

After some poking around and several rebuilds of log shipping (deleting the secondary, taking a new FULL on OldServer01, restoring it to LSSecondary, and then recreating the process) I figured out the issue was related to the first part of the first error: 

“Could not redo log record (31185747:40079:1), for transaction ID (0:1129634848), on page (1:10522412), database 'Database99'” 

The redo reader process was looking in the log shipping folder \\LSSecondary\Database99 and trying to restore the oldest TRN log backup files, which were related to the log shipping configuration from before the migration/fail-back over the weekend.  Those older TRN files had log sequence numbers (LSNs) that were from before the FULL backup used to initialize the re-created log shipping, which was making the overall process error out and blowing up the process.

A colleague suggested that the log shipping restore should be skipping TRN files that are "too old" via LSN (which matched my understanding of how it "should" work) and that maybe this reflected one of the old TRN files having an invalid file header.  After some consideration we realized that probably was *not* the case since the error message doesn't reflect not being able to read the backup file header, and if it couldn't read the header how would it have a log record number (31185747:40079:1) to include in the "could not redo" portion of the error?

I deleted the old backups from \\LSSecondary\Database99 and tried to run the process again, and received the same errors.  When I looked in \\LSSecondary\Database99 I saw all of the old TRN files were back!

I had failed to consider (sigh) that the LSCopy job would bring all of the "old" files from the primary folder at \\OldServer01\Database99 back to \\LSSecondary\Database99, putting me back in the same situation.

Take 2 - I deleted all of the log shipping backups from *both* \\OldServer01\Database99 and \\LSSecondary\Database99 and then re-created the log shipping one last time…

https://memegenerator.net/img/instances/500x/27017972/success.jpg
This time the LS_Restore job functioned as expected, Database99 was in Standby/Read-Only on LSSecondary, and instead of my 100+ line stack dump I received some very respectable information messages in the SQL Error Log on LSSecondary:

--
Setting database option SINGLE_USER to ON for database Database99.
Starting up database 'Database99'.
The database 'Database99' is marked RESTORING and is in a state that does not allow recovery to be run.
Recovery is writing a checkpoint in database 'Database99' (5). This is an informational message only. No user action is required.
Recovery completed for database Database99 (database ID 5) in 2 second(s) (analysis 1080 ms, redo 0 ms, undo 466 ms.) This is an informational message only. No user action is required.
Starting up database 'Database99'.
CHECKDB for database 'Database99' finished without errors on 2017-10-23 10:49:44.400 (local time). This is an informational message only; no user action is required.
Log was restored. Database: Database99, creation date(time): 2017/10/22(02:20:41), first LSN: 31185760:34887:1, last LSN: 31185760:44042:1, number of dump devices: 1, device information: (FILE=1, TYPE=DISK: {'\\LSSecondary\Database99\Database99_20171023204500.trn'}). This is an informational message. No user action is required.
Setting database option MULTI_USER to ON for database Database99.
 --

The SINGLE_USER and MULTI_USER messages are artifacts of telling the log shipping to disconnect users when restoring backups.

--

Hope this helps!

Wednesday, October 18, 2017

And YOU Get a Deadlock and YOU Get a Deadlock and EVERYBODY GETS A DEADLOCK!

https://memegenerator.net/img/instances/400x/59507872/you-get-a-deadlock-everybody-gets-a-deadlock.jpg
We all get them at one point or another - deadlocks.

I have Object 1 and need Object 2, you have Object 2 and need Object 1:

https://i-technet.sec.s-msft.com/dynimg/IC4289.gif
This isn't the same as simple blocking, where I have Object 1 and you want it (but you can't have it because it's MINE!)  In that case you just need to wait your turn, and eventually you will get sick of waiting and walk away (timeout) or I will get done with it and surrender it to you.

In the deadlock case, there truly is no way out other than terminating one of the two requests - in the example above transaction 1 needs access to the Part table to proceed with its work on the Supplier table, and transaction 2 needs access to the Supplier table to work with the Part table.  This means everyone is stuck.

A great (yet sad) non-techie way of describing a deadlock is this:

https://cdn.meme.am/instances/27106434/no-job-because-no-experience-no-experience-because-no-job.jpg
--

SQL Server has a process called the Deadlock Monitor that periodically watches for deadlock situations and handles them by terminating one or more processes to clear the logjam.  As described in Technet:
By default, the Database Engine chooses as the deadlock victim the session running the transaction that is least expensive to roll back. Alternatively, a user can specify the priority of sessions in a deadlock situation using the SET DEADLOCK_PRIORITY statement. DEADLOCK_PRIORITY can be set to LOW, NORMAL, or HIGH, or alternatively can be set to any integer value in the range (-10 to 10). The deadlock priority defaults to NORMAL. If two sessions have different deadlock priorities, the session with the lower priority is chosen as the deadlock victim. If both sessions have the same deadlock priority, the session with the transaction that is least expensive to roll back is chosen. If sessions involved in the deadlock cycle have the same deadlock priority and the same cost, a victim is chosen randomly.
"Least Expensive to Roll Back" can sometimes be difficult to determine although a general guideline is that SELECT statements are the least expensive (with no LOG to roll back) and are frequent deadlock victims.

--

With all of that in hand, how do you gather information about deadlocks?  Classical deadlock troubleshooting relies on trace flags 1204 and 1222 or a SQL Server trace on the Deadlock Chain and Deadlock Graph events (as described in this MSSQLTip).
https://www.toonpool.com/user/589/files/living_in_the_past_333905.jpg

We all know at this point (whether some people want to admit it or not) that Extended Events are the tool of the future for this kind of work.  They are more lightweight than SQL Server Trace and are continuing to be improved (unlike Trace, which has been stagnant since SQL Server 2005!)  Any time you find yourself using Trace or Profiler you should pause and reflect on how to accomplish the same task with XEvents.

In this case, there is deadlock information in the System_Health Session, the basic session deployed by default on all SQL Server instances since SQL Server 2008.  (Think the better, stronger, faster version of the Default Trace.)  Patrick Keisler (blog/@PatrickKeisler) has a great write-up about the System_Health session on his blog here.

By default deadlock information is one giant character field that is formatted for XML.  The XML is readable (sort of) if that's your thing:

--
<deadlock>

  <victim-list>
    <victimProcess id="process5776a5498" />
  </victim-list>
  <process-list>
    <process id="process5776a5498" taskpriority="0" logused="0" waitresource="KEY: 19:72057594040745984 (9ba325989765)" waittime="4168" ownerId="327954909" transactionname="SELECT" lasttranstarted="2017-10-17T17:21:04.417" XDES="0x50560f9d0" lockMode="S" schedulerid="1" kpid="3756" status="suspended" spid="64" sbid="0" ecid="0" priority="0" trancount="0" lastbatchstarted="2017-10-17T17:21:04.417" lastbatchcompleted="2017-10-17T17:21:03.293" lastattention="1900-01-01T00:00:00.293" hostpid="4012" loginname="Login1" isolationlevel="read committed (2)" xactid="327954909" currentdb="19" lockTimeout="4294967295" clientoption1="671088672" clientoption2="128056">
      <executionStack>
        <frame procname="adhoc" line="1" sqlhandle="0x0200000096663f042d323a9d19c06335abb6318e18e6f8750000000000000000000000000000000000000000">
SELECT dbo.Fred_Table7.ObjectID FROM dbo.Fred_Table7 WHERE (dbo.Fred_Table7.ObjectID not  in (23) AND dbo.Fred_Table7.ObjectID &lt;= 4294966615 AND (dbo.Fred_Table7.LastModifyTime &gt;CAST('2017 10 17 20 08 18 115' AS VARBINARY(32)) OR dbo.Fred_Table7.LastModifyTime =CAST('2017 10 17 20 08 18 115' AS VARBINARY(32)) AND dbo.Fred_Table7.ObjectID &gt; 2404307) AND (dbo.Fred_Table7.SI_INSTANCE_OBJECT = 0 AND dbo.Fred_Table7.SI_RUNNABLE_OBJECT = 0  OR (dbo.Fred_Table7.SI_INSTANCE_OBJECT = 1 OR dbo.Fred_Table7.SI_RUNNABLE_OBJECT = 1)  AND dbo.Fred_Table7.SI_RUNNABLE_OBJECT = 0 AND dbo.Fred_Table7.ScheduleStatus = 1) AND dbo.Fred_Table7.TypeID in (2, 8, 21, 67, 68, 260, 261, 262, 266, 267, 270, 271, 272, 277, 279, 281, 282, 289, 295, 301, 337, 338, 339, 340, 343, 349, 351, 355, 358, 361, 365, 368, 372, 373, 379, 385, 387, 389, 391, 394, 396, 407, 408, 414, 415, 430, 431) AND (SI_PLUGIN_OBJECT = 0) AND dbo.Fred_Table7.SI_HIDDEN_OBJECT = 0) ORDER BY dbo    </frame>
      </executionStack>
      <inputbuf>
SELECT dbo.Fred_Table7.ObjectID FROM dbo.Fred_Table7 WHERE (dbo.Fred_Table7.ObjectID not  in (23) AND dbo.Fred_Table7.ObjectID &lt;= 4294966615 AND (dbo.Fred_Table7.LastModifyTime &gt;CAST('2017 10 17 20 08 18 115' AS VARBINARY(32)) OR dbo.Fred_Table7.LastModifyTime =CAST('2017 10 17 20 08 18 115' AS VARBINARY(32)) AND dbo.Fred_Table7.ObjectID &gt; 2404307) AND (dbo.Fred_Table7.SI_INSTANCE_OBJECT = 0 AND dbo.Fred_Table7.SI_RUNNABLE_OBJECT = 0  OR (dbo.Fred_Table7.SI_INSTANCE_OBJECT = 1 OR dbo.Fred_Table7.SI_RUNNABLE_OBJECT = 1)  AND dbo.Fred_Table7.SI_RUNNABLE_OBJECT = 0 AND dbo.Fred_Table7.ScheduleStatus = 1) AND dbo.Fred_Table7.TypeID in (2, 8, 21, 67, 68, 260, 261, 262, 266, 267, 270, 271, 272, 277, 279, 281, 282, 289, 295, 301, 337, 338, 339, 340, 343, 349, 351, 355, 358, 361, 365, 368, 372, 373, 379, 385, 387, 389, 391, 394, 396, 407, 408, 414, 415, 430, 431) AND (SI_PLUGIN_OBJECT = 0) AND dbo.Fred_Table7.SI_HIDDEN_OBJECT = 0) ORDER BY db   </inputbuf>
    </process>
    <process id="process1b9313868" taskpriority="0" logused="284" waitresource="KEY: 19:72057594041008128 (1679744d2988)" waittime="4168" ownerId="327954910" transactionname="implicit_transaction" lasttranstarted="2017-10-17T17:21:04.417" XDES="0x57788c3a8" lockMode="X" schedulerid="2" kpid="11128" status="suspended" spid="60" sbid="0" ecid="0" priority="0" trancount="2" lastbatchstarted="2017-10-17T17:21:04.417" lastbatchcompleted="2017-10-17T17:21:04.417" lastattention="1900-01-01T00:00:00.417" hostpid="4012" loginname="Login1" isolationlevel="read committed (2)" xactid="327954910" currentdb="19" lockTimeout="4294967295" clientoption1="671088672" clientoption2="128058">
      <executionStack>
        <frame procname="adhoc" line="1" stmtstart="86" sqlhandle="0x0200000011d6ca3a9ba3506e8d74888b6554043825c5204e0000000000000000000000000000000000000000">
UPDATE dbo.Fred_Table7 SET Version = @P1, LastModifyTime = @P2 WHERE ObjectID = @P3 AND Version = @P4    </frame>
        <frame procname="unknown" line="1" sqlhandle="0x0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000">
unknown    </frame>
      </executionStack>
      <inputbuf>
(@P1 int,@P2 varbinary(32),@P3 int,@P4 int)UPDATE dbo.Fred_Table7 SET Version = @P1, LastModifyTime = @P2 WHERE ObjectID = @P3 AND Version = @P4   </inputbuf>
    </process>
  </process-list>
  <resource-list>
    <keylock hobtid="72057594040745984" dbid="19" objectname="Database12.dbo.Fred_Table7" indexname="ObjectID_I7" id="lock52e379a80" mode="X" associatedObjectId="72057594040745984">
      <owner-list>
        <owner id="process1b9313868" mode="X" />
      </owner-list>
      <waiter-list>
        <waiter id="process5776a5498" mode="S" requestType="wait" />
      </waiter-list>
    </keylock>
    <keylock hobtid="72057594041008128" dbid="19" objectname="Database12.dbo.Fred_Table7" indexname="LastModifyTime_I7" id="lock35b41fa80" mode="S" associatedObjectId="72057594041008128">
      <owner-list>
        <owner id="process5776a5498" mode="S" />
      </owner-list>
      <waiter-list>
        <waiter id="process1b9313868" mode="X" requestType="wait" />
      </waiter-list>
    </keylock>
  </resource-list>
</deadlock>
--

You can piece through it to find the <inputbuf> tag to find the queries on the Fred_Table7 that are involved, and find the loginname of Login1, but is this really how you want to consume this data?

http://3.bp.blogspot.com/_KyTS_XQfpkA/TGHAMAdjUjI/AAAAAAAAEx4/xLZ0dYgTt1g/s1600/HaHaNo.jpg
A much easier way to deal with this information is in the Deadlock Graph.  


The Deadlock Graph turns the XML information show above into, well, a graph.  You can see the two transactions as the circles on the outside and the two objects involved in the rectangles in the center of the diagram.  Sometimes there are more than two transactions involved in a deadlock (beyond the scope of this post) but if there are more than two they will show up in the graph as well.  When you open the graph in SQL Server Management Studio you can hover over the transaction circle and see the underlying query and some other transaction data.

--

As mentioned above there is a trace event to return the Deadlock Graph, and you can draw it via XEvents as well with a query.

NOTE - there was a change in the way deadlock information is processed in XEvents between SQL Server 2008R2 and SQL Server 2012, so there are different scripts depending on version:

--
/*

SQL Server Deadlock Graph Extraction from Ring Buffer
SQL Server 2008/2008R2
Modified from https://www.sqlservercentral.com/Forums/1399315/System-health-extended-event-session-does-not-capture-latest-deadlocks?PageIndex=1

*/
;WITH SystemHealth

AS (
   SELECT 
      CAST ( target_data AS xml ) AS SessionXML
   FROM 
      sys.dm_xe_session_targets st
   INNER JOIN 
      sys.dm_xe_sessions s 
   ON 
      s.[address] = st.event_session_address
   WHERE 
      name = 'system_health'
)
SELECT 
   Deadlock.value ( '@timestamp', 'datetime' ) AS DeadlockDateTime
,   CAST ( Deadlock.value ( '(data/value)[1]', 'Nvarchar(max)' ) AS XML ) AS DeadlockGraph
FROM 
   SystemHealth s
CROSS APPLY 
   SessionXML.nodes ( '//RingBufferTarget/event' ) AS t (Deadlock)
WHERE 
   Deadlock.value ( '@name', 'nvarchar(128)' ) = N'xml_deadlock_report'
ORDER BY
   Deadlock.value ( '@timestamp', 'datetime' );
--
/*

SQL Server Deadlock Graph Extraction from Ring Buffer
SQL Server 2012+
Modified from https://www.red-gate.com/simple-talk/sql/database-administration/handling-deadlocks-in-sql-server/

*/
SELECT  XEvent.query('(event/data/value/deadlock)[1]') AS DeadlockGraph

FROM    ( SELECT    XEvent.query('.') AS XEvent
          FROM      ( SELECT    CAST(target_data AS XML) AS TargetData
                      FROM      sys.dm_xe_session_targets st
                                JOIN sys.dm_xe_sessions s
                                 ON s.address = st.event_session_address
                      WHERE     s.name = 'system_health'
                                AND st.target_name = 'ring_buffer'
                    ) AS Data
                    CROSS APPLY TargetData.nodes
                  ('RingBufferTarget/event[@name="xml_deadlock_report"]')
                    AS XEventData ( XEvent )
        ) AS src;   
--

Another awesome change in SQL 2012+ is that the System_Health session now writes to a file target as well as the ring buffer memory target.  This is cool because the files persist more data than can be maintained in the ring buffer, giving you more history.  Retrieving data from the XEL files requires a different query, which is set up to dynamically read the XEL files from your default LOG path, wherever it may be:

--
/*

SQL Server Deadlock Graph Extraction from XEL file target
SQL Server 2012+
http://www.sqlservercentral.com/blogs/sql-geek/2017/10/07/extracting-deadlock-information-using-system_health-extended-events/

*/

CREATE TABLE #errorlog (LogDate DATETIME , ProcessInfo VARCHAR(100), [Text] VARCHAR(MAX))
DECLARE @tag VARCHAR (MAX) , @path VARCHAR(MAX);
INSERT INTO #errorlog EXEC sp_readerrorlog;

SELECT @tag = text
FROM #errorlog 
WHERE [Text] LIKE 'Logging%MSSQL\Log%';
DROP TABLE #errorlog;
SET @path = SUBSTRING(@tag, 38, CHARINDEX('MSSQL\Log', @tag) - 29);
SELECT 

CONVERT(xml, event_data).query('/event/data/value/child::*') AS DeadlockReport,
CONVERT(xml, event_data).value('(event[@name="xml_deadlock_report"]/@timestamp)[1]', 'datetime') 
AS Execution_Time
FROM sys.fn_xe_file_target_read_file(@path + '\system_health*.xel', NULL, NULL, NULL)
WHERE OBJECT_NAME like 'xml_deadlock_report';
--

As an example, here is the output of the  ring buffer query versus the file target (XEL) query on a given server:


The file target has six times as much history as the ring buffer (in this case) - you should "always" see the same or more data in the XEL files than the ring buffer.

--

All of these scripts return an XML output like the XML shown above - it doesn't return a graph directly.

https://memegenerator.net/img/instances/71836370/say-whaaat.jpg
To get the graph is pretty straightforward - once you know how.  Open the XML document, either by clicking the link in the resultset or by copy-pasting the XML into the text editor of your choice:



Once the XML document is open, either in Management Studio or in your text editor, simply save the document as extension .XDL rather than .XML  The new file will have an icon like this: 




If you open this file in Management Studio (via double-click or Open>File in Management Studio) you will be presented with the actual Deadlock Graph as seen above, complete with ability to hover to see transaction information, etc. 

...unless you are using SQL Server 2008/2008R2 Management Studio, in which case you will see this:


Argghhhhh!

As described by Extended Events guru Jonathan Kehayias (blog/@sqlpoolboy) in a blog post here, this is an artifact of SQL 2008 Management Studio not being ready to render the deadlock output from XEvents (as opposed to the deadlock output from Trace).  As Jonathan notes there are two ways to handle this - the first way is to use an up-level Management Studio of at least version 2012, either by upgrading the local copy of Management Studio (you can download the current Management Studio that is still compatible back to 2008 from Microsoft here) or by copying the XDL file to another server with a more recent Management Studio.  The other way to open the file under SQL 2008/2008R2 is to use SentryOne's free tool Plan Explorer Pro.  I usually don't pitch company products but this is a really useful 100% free tool.

The most common answer will be the first - copy the XDL to a server with 2012+ Management Studio, and all will be well.

--

UPDATE - a friend pointed out that I had forgotten a script that does a great job shredding the Deadlock information from files, the ring buffer, or SentryOne's data collection into a useful format.  It was created by MCM/MVP Wayne Sheffield (blog/@DBAWayne) and is located here.

--

Hope this helps!