Nebraska SQL from @DBA_ANDY: Log Shipping - Exclusive access could not be obtained because the database is in use

This story began as they almost always do - with a page at 4am local time...

--

'Alert: SQL Failed Jobs on InstanceName is Failed'

Job name: LSRestore_InstanceName_DatabaseName
Date & Message: 2014-07-31 05:24:00.0 The job failed. The Job was invoked by User DOMAIN\LOGIN. The last step to run was step 1 (Log shipping restore log job step.).

--

Sigh.

I am not a fan of log shipping but I do acknowledge that it has its place. In this case the client is using it to have a read-only standby database on a Standard Edition SQL Server (hence no mirroring+snapshot or Availability Group with readable secondary). The database in question is large (>1TB) but with relatively small daily churn (~5GB of LOG backups per day). Due to the traffic on the standby database during the day they have their log shipping on this database configured to only run restores during the overnight, keeping the standby database static and available for their users during the business day.

By the time I received the page, the LSRestore job has been failing for much of the overnight, with service desk'ers trying various things to try to resolve the issue.

When I signed on to the system I drilled into the job step history and found the following:

--

Microsoft (R) SQL Server Log Shipping Agent [Assembly Version = 10.0.0.0, File Version = 10.50.1600.1 ((KJ_RTM).100402-1539 )] Microsoft Corporation. All rights reserved. 2014-07-31 03:45:00.32 ----- START OF TRANSACTION LOG RESTORE -----

2014-07-31 03:45:00.39 Starting transaction log restore. Secondary ID: '8f45136f-fb73-4815-b849-c7c1b391831b' 2014-07-31 03:45:00.39 Retrieving restore settings. Secondary ID: '8f45136f-fb73-4815-b849-c7c1b391831b' 2014-07-31 03:45:00.41 Retrieved common restore settings. Primary Server: 'InstanceName', Primary Database: 'DatabaseName', Backup Destination Directory: 'H:\LogShip\DatabaseName', File Retention Period: 1440 minute(s) 2014-07-31 03:45:00.42 Retrieved database restore settings. Secondary Database: 'DatabaseName', Restore Delay: 10, Restore All: True, Restore Mode: Standby, Disconnect Users: False, Last Restored File: H:\LogShip\DatabaseName\DatabaseName_20140730071500.trn, Block Size: Not Specified, Buffer Count: Not Specified, Max Transfer Size: Not Specified

2014-07-31 03:45:20.77 *** Error: Could not apply log backup file 'H:\LogShip\DatabaseName\DatabaseName_20140730073000.trn' to secondary database 'DatabaseName'.(Microsoft.SqlServer.Management.LogShipping) *** 2014-07-31 03:45:20.77 *** Error: Exclusive access could not be obtained because the database is in use. RESTORE LOG is terminating abnormally.(.Net SqlClient Data Provider) ***

2014-07-31 03:45:21.05 *** Error: The log backup file 'H:\LogShip\DatabaseName\DatabaseName_20140730073000.trn' was verified but could not be applied to secondary database 'DatabaseName'.(Microsoft.SqlServer.Management.LogShipping) *** 2014-07-31 03:45:21.05 Deleting old log backup files. Primary Database: 'DatabaseName' 2014-07-31 03:45:21.06 The restore operation completed with errors. Secondary ID: '8f45136f-fb73-4815-b849-c7c1b391831b' 2014-07-31 03:45:21.06 ----- END OF TRANSACTION LOG RESTORE ----- Exit Status: 1 (Error)

--

Seeing the "Exclusive access could not be obtained because the database is in use" error (buried in the middle of the messages), I went looking for something connected to the DatabaseName database. Sure enough there it was:

spid	blocked	dbid	uid	login_time	last_batch	status	hostname
69	0	10	1	7/30/2014 11:07	7/30/2014 11:07	sleeping	ServerName

program_name		hostprocess	cmd	loginame	stmt_start	stmt_end	request_id
Microsoft SQL Server Management Studio - Query		5404	AWAITING COMMAND	Login1	0	0	0

Basically someone opened a Management Studio window on InstanceName as loginname Login1 at 11:07am the previous day, ran one or two queries (last_batch only one second after the login_time), and then left the window open, maintaining a connection to the DatabaseName database.

Log Shipping requires exclusive access (as noted in the error) to apply the log backups, so this single Management Studio connection was effectively breaking log shipping.

Because the SPID was "Awaiting Command" with a Last_Batch of some time ago I went ahead and killed SPID 69 and manually started the "LSRestore_ServerName_DatabaseName" job. The job completed successfully, restoring the last day's worth of logs in about ten minutes.

How could this be prevented? What was configured incorrectly?

When setting up a log shipping secondary as a read-only standby, there is an option to disconnect any existing connections before attempting log shipping restores. This option is available whether you use the Management Studio GUI or T-SQL commands.

In Management Studio, the option is "Disconnect users in the database before restoring backups." This option is on the "Restore Transaction Log" tab on the "Secondary Database Settings" window after you select to "Add" a secondary database. The option is only exposed if you select Standby mode - there is no need for the option in No Recovery mode since users won't be able to connect to the secondary database anyway:

Via T-SQL, the option is a parameter of the "sp_add_log_shipping_secondary_database" stored procedure. Set "@disconnect_users = 1" to enable the disconnect functionality.

IMPORTANT NOTE - as seen in the screenshot above, this option is not enabled by default in Management Studio, and it is also not enabled by default by the stored procedure (default value is 0). If you wish to use this functionality you need to go "out of your way" to turn it on. Be aware of the impact of this option *before* turning it on. If use pattern/business rules of your standby database is such that queries against the standby are more important than successful restores, do *not* enable this option as it will terminate your queries with extreme prejudice.

http://sd.keepcalm-o-matic.co.uk/i/keep-calm-and-terminate-with-extreme-prejudice.png

Hope this helps!

2 comments:

AnonymousAugust 13, 2014 at 6:13 AM
As an alternative I had a job scheduled at 'after work hour' containing a step like:

declare @a_spid smallint
declare @msg varchar(255)
declare db_users insensitive cursor for
select sp.spid
from sys.dm_exec_sessions es join sys.sysprocesses sp
on es.session_id = sp.spid
where
(es.host_name like 'pc%' )
and sp.spid > 50 and sp.spid != @@SPID

open db_users
fetch next from db_users into @a_spid
while @@fetch_status = 0
begin
select @msg = 'kill '+convert(char(5),@a_spid)
--print @msg
execute (@msg)
fetch next from db_users into @a_spid
end
close db_users
deallocate db_users

This way the restores could start during work hours (if no other connection), but would certainly be started after work hours.

Worked for us.

Thursday, July 31, 2014

Log Shipping - Exclusive access could not be obtained because the database is in use

2 comments: