Wednesday, March 21, 2012

Log shipping on Secondary cannot load large log file

We have log shipping from a primary to a secondary for warm standby and
reporting.
After a reindexing job that caused a particularly large log backup to be
created (9,141,927,424 bytes), the secondary appears to have pre-allocated
the disk space required to copy the file over (the file has a time stamp for
the time at which the Log-ship copy was scheduled) but after 5+ hours the
time stamp on the file has not changed and the log-ship copy/restore history
reports errors like:
[Microsoft SQL-DMO (ODBC SQLState: 42000)] Error 3201: [Microsoft][ODBC SQL
Server Driver][SQL Server]Cannot open backup device
'H:\SQLData\MSSQL$NODEB\BACKUP\LogShip\OASIS_TTC_tlog_200408311200.TRN'.
Device error or device off-line. See the SQL Server error log for more
details.
[Microsoft][ODBC SQL Server Driver][SQL Server]RESTORE LOG is terminating
abnormally.
When I inspect the process in the secondary that appears to be doing the
copy I see the following:
EXECUTE master.dbo.xp_sqlmaint '-LSCopyPlanID
"87C86B58-847F-4505-AFAE-4A1F9E31211E"'
Status: runnable
CPU: 0
Physical I/O: 0
Memory Usage: 2
Login time: 31/8/2004 1:00:07 PM
Last Batch: 31/8/2004 1:00:07 PM
As you can see the process seems to have stalled.
As far as disk and network through-put is concerned I would expect a worst
case scenario of 10 Mbytes per second - more than enough to copy the file
within the time since it was created.
Is there a way to recover from this situation without having to remove log
shipping and starting from scratch?
Any tips appreciated.Hello David
I'm not clear on the problem description that you have outlined in your
message. As I understand it, you are mentioning the following sequence of
events (please correct me if I'm wrong) :
1. You had a large tran log backup that has the same timestamp in the
filename as when the copy job ran.
2. You noticed, 5 hours after the copy job ran, that the restore job failed
with the error :
[Microsoft SQL-DMO (ODBC SQLState: 42000)] Error 3201: [Microsoft][ODBC SQL
Server Driver][SQL Server]Cannot open backup device
'H:\SQLData\MSSQL$NODEB\BACKUP\LogShip\OASIS_TTC_tlog_200408311200.TRN'.
Device error or device off-line. See the SQL Server error log for more
details.
[Microsoft][ODBC SQL Server Driver][SQL Server]RESTORE LOG is terminating
abnormally.
I wanted to point out couple of things here :
1. The filename for the transaction log backup does not change through the
entire log shipping life cycle.
2. The filename of the transaction log backup file and the time the copy
job runs has no relation
3. Copy and Restore jobs on the secondary server are 2 different jobs and
run independent of each other.
4. The failure that you see above is reported by the Restore job (hence the
message "RESTORE LOG is terminating abnormally")
The question that I wanted to ask you was - Are you absolutely positive
that the copy job has completed copying the file to the secondary server?
Please confirm this by either trying to rename the file (and then renaming
it back to its original name) or using the HANDLE utility from
SYSINTERNALS.COM
As you have mentioned later in your message, it seems like the copy job is
still running. I would advise you to wait for the job to complete.
Thank you for using Microsoft newsgroups.
Sincerely
Pankaj Agarwal
Microsoft Corporation
This posting is provided AS IS with no warranties, and confers no rights.|||Hi Pankaj,
Thanks for the reply.
To cut a long story short, I killed the copy process that appeared to have
stalled.
In due course a new process started and the file was restored into the
secondary within a reasonably short amount of time.
I realise that killing a job like that is risky and normally you should
leave it run, but in my case we have a 55 GB database that requries the
log-shipping secondary to be highly available and up-to-date for reporting.
My choices were:
1) Kill the job and hope the secondary came back into synch within a couple
of hours.
2) Remove log-shipping and wait 8+ hours for the secondary to rebuild from
scratch.
I took a gamble and chose option 1.
"Pankaj Agarwal [MSFT]" wrote:
> Hello David
> I'm not clear on the problem description that you have outlined in your
> message. As I understand it, you are mentioning the following sequence of
> events (please correct me if I'm wrong) :
> 1. You had a large tran log backup that has the same timestamp in the
> filename as when the copy job ran.
> 2. You noticed, 5 hours after the copy job ran, that the restore job failed
> with the error :
> [Microsoft SQL-DMO (ODBC SQLState: 42000)] Error 3201: [Microsoft][ODBC SQL
> Server Driver][SQL Server]Cannot open backup device
> 'H:\SQLData\MSSQL$NODEB\BACKUP\LogShip\OASIS_TTC_tlog_200408311200.TRN'.
> Device error or device off-line. See the SQL Server error log for more
> details.
> [Microsoft][ODBC SQL Server Driver][SQL Server]RESTORE LOG is terminating
> abnormally.
> I wanted to point out couple of things here :
> 1. The filename for the transaction log backup does not change through the
> entire log shipping life cycle.
> 2. The filename of the transaction log backup file and the time the copy
> job runs has no relation
> 3. Copy and Restore jobs on the secondary server are 2 different jobs and
> run independent of each other.
> 4. The failure that you see above is reported by the Restore job (hence the
> message "RESTORE LOG is terminating abnormally")
> The question that I wanted to ask you was - Are you absolutely positive
> that the copy job has completed copying the file to the secondary server?
> Please confirm this by either trying to rename the file (and then renaming
> it back to its original name) or using the HANDLE utility from
> SYSINTERNALS.COM
> As you have mentioned later in your message, it seems like the copy job is
> still running. I would advise you to wait for the job to complete.
> Thank you for using Microsoft newsgroups.
> Sincerely
> Pankaj Agarwal
> Microsoft Corporation
> This posting is provided AS IS with no warranties, and confers no rights.
>

No comments:

Post a Comment