Hi,
I've got a production server log shipping a 280GB database to a remote DR
site server. It has been running without incident for months now, but last
week it stopped restoring logs (the copy process was still running) with an
error 3456:
[Microsoft SQL-DMO (ODBC SQLState: HY000)] Error 3456: [Microsoft]
91;ODBC SQL
Server Driver][SQL Server]Could not redo log record (500478:68225:4), fo
r
transaction ID (3:1085081790), on page (3:2789016), database 'JDE_Prod' (5).
Page: LSN = (500478:57038:10), type = 2. Log: OpCode = 2, context 3,
PrevPageLSN: (500478:68221:4).
I restored the remote database from a full tape backup shipped to the site
via courier (because we weren't sure if the link was to blame for the proble
m
restoring). Log shipping worked well again for a few days, but this morning
has again stopped restoring with the exact same error!
I've read the article "http://support.microsoft.com/kb/831950", and although
it describes the same error, it doesnt seem to apply to us as we weren't
doing any role changing, and weren't backing up the database manually with
the NORECOVERY switch (i.e. the backups were being done as usual by the log
shipping maint. plan).
Local server build is 8.00.997, and remote (DR) server build is 8.00.818.
Could the fact that they are slightly different versions have anything to do
with this problem?
Could this be corruption introducted by the network link? If so, is the
only way to fix this to fully restore the database again? Or is there some
way to get good copies of the log it failed on and restore those manually?
I
don't have a huge amount of experience with log shipping, so any help would
be greatly appreciated - especially being a 24/7 mission critical DR server,
and in the middle of the holidays! Murphy's Law!
Thanks,
davidHi David
The SQL Builds may cause a problem. I am not sure about it.
Rearding the file restore; it isn't necessary that you restore the entire
db. I would suggest the following steps:
1) Check the table msdb..log_shipping_plan_history for the last loaded file.
try
select * from msdb..log_shipping_plan_history order by endtime desc
2) Try restoring tha tfile manually from the Query Analyzer. try
restore log
<db_name>
from
disk = 'file_path'
with
standby = 'undo.txt'
3) if the above step succeeds keep restoring the successive trn files till
the step fails. Then that file at which the step fails is the corrupt file.
4) Copy only that file from the primary server and try step 2 with it.
5) re-run the LS Jobs and the ywill succeed.
Hope this helps.
I shall get back with mpre information about the affect of SQL builds on LS.
Thanks
Amer M J
MCP
"DavidCur" wrote:
> Hi,
> I've got a production server log shipping a 280GB database to a remote DR
> site server. It has been running without incident for months now, but las
t
> week it stopped restoring logs (the copy process was still running) with a
n
> error 3456:
> [Microsoft SQL-DMO (ODBC SQLState: HY000)] Error 3456: [Microsoft]
[ODBC SQL
> Server Driver][SQL Server]Could not redo log record (500478:68225:4),
for
> transaction ID (3:1085081790), on page (3:2789016), database 'JDE_Prod' (5
).
> Page: LSN = (500478:57038:10), type = 2. Log: OpCode = 2, context 3,
> PrevPageLSN: (500478:68221:4).
> I restored the remote database from a full tape backup shipped to the site
> via courier (because we weren't sure if the link was to blame for the prob
lem
> restoring). Log shipping worked well again for a few days, but this morni
ng
> has again stopped restoring with the exact same error!
> I've read the article "http://support.microsoft.com/kb/831950", and althou
gh
> it describes the same error, it doesnt seem to apply to us as we weren't
> doing any role changing, and weren't backing up the database manually with
> the NORECOVERY switch (i.e. the backups were being done as usual by the lo
g
> shipping maint. plan).
> Local server build is 8.00.997, and remote (DR) server build is 8.00.818.
> Could the fact that they are slightly different versions have anything to
do
> with this problem?
> Could this be corruption introducted by the network link? If so, is the
> only way to fix this to fully restore the database again? Or is there some
> way to get good copies of the log it failed on and restore those manually?
I
> don't have a huge amount of experience with log shipping, so any help woul
d
> be greatly appreciated - especially being a 24/7 mission critical DR serve
r,
> and in the middle of the holidays! Murphy's Law!
>
> Thanks,
> david|||Thanks for the quick response. The log files are copied and restored every
15 minutes. The problem occurred this morning around 12:15am. I've tried
your suggestion about restoring manually with the standby undo file.
I ran the following command (using the 12:00am file) successfully, but the
12:15am file produces the following output:
---
restore log JDE_Prod
from disk = 'K:\Backups\DRLogsIn\JDE_Prod_tlog_20051
2300015.TRN'
with standby = 'K:\Backups\DRLogsIn\LogUndo.tuf'
---
Deleting database file 'K:\Backups\DRLogsIn\LogUndo.tuf'.
Processed 34415 pages for database 'JDE_Prod', file 'JDE_PRODUCTION_log' on
file 1.
Server: Msg 3456, Level 21, State 1, Line 1
Could not redo log record (500478:68225:4), for transaction ID
(3:1085081790), on page (3:2789016), database 'JDE_Prod' (5). Page: LSN =
(500478:57038:10), type = 2. Log: OpCode = 2, context 3, PrevPageLSN:
(500478:68221:4).
Connection Broken
---
This 12:15 file has already been re-copied, but I will try again.
Would be interesting to see if you find any issues with different builds in
log shipping. The patches were applied to the local server a few months ago
(3 or 4 months), and log shipping has been running without incident this
whole time.
Thanks again,
Dave
"Amer M J" wrote:
> Hi David
> The SQL Builds may cause a problem. I am not sure about it.
> Rearding the file restore; it isn't necessary that you restore the entire
> db. I would suggest the following steps:
> 1) Check the table msdb..log_shipping_plan_history for the last loaded fil
e.
> try
> select * from msdb..log_shipping_plan_history order by endtime desc
> 2) Try restoring tha tfile manually from the Query Analyzer. try
> restore log
> <db_name>
> from
> disk = 'file_path'
> with
> standby = 'undo.txt'
> 3) if the above step succeeds keep restoring the successive trn files till
> the step fails. Then that file at which the step fails is the corrupt file
.
> 4) Copy only that file from the primary server and try step 2 with it.
> 5) re-run the LS Jobs and the ywill succeed.
> Hope this helps.
> I shall get back with mpre information about the affect of SQL builds on L
S.
> Thanks
> Amer M J
> MCP|||Hi Dave
I am curious here. Was the '.tuf' file deleted as per a part of the process
of manually ?
Also the builds do play a major role here. From what I can see as per your
information, the primary server is of a higher build than the secondary
server. So I was wondering how a log file of a db from a higher build was
getting restored onto a lower build server.
Also I would suggest checking out the integrity of the trn files on the
primary server. try
restore verifyonly command to check the backup set's integrity.
Please do check if anyother process is accessing the db on the secondary
server as this may disrupt the LS process.
Also check this link.
http://support.microsoft.com/kb/329487/en-us
Thanks
Amer M J
MCP
"DavidCur" wrote:
[vbcol=seagreen]
> Thanks for the quick response. The log files are copied and restored ever
y
> 15 minutes. The problem occurred this morning around 12:15am. I've tried
> your suggestion about restoring manually with the standby undo file.
> I ran the following command (using the 12:00am file) successfully, but the
> 12:15am file produces the following output:
> ---
> restore log JDE_Prod
> from disk = 'K:\Backups\DRLogsIn\JDE_Prod_tlog_20051
2300015.TRN'
> with standby = 'K:\Backups\DRLogsIn\LogUndo.tuf'
> ---
> Deleting database file 'K:\Backups\DRLogsIn\LogUndo.tuf'.
> Processed 34415 pages for database 'JDE_Prod', file 'JDE_PRODUCTION_log' o
n
> file 1.
> Server: Msg 3456, Level 21, State 1, Line 1
> Could not redo log record (500478:68225:4), for transaction ID
> (3:1085081790), on page (3:2789016), database 'JDE_Prod' (5). Page: LSN =
> (500478:57038:10), type = 2. Log: OpCode = 2, context 3, PrevPageLSN:
> (500478:68221:4).
> Connection Broken
> ---
> This 12:15 file has already been re-copied, but I will try again.
> Would be interesting to see if you find any issues with different builds i
n
> log shipping. The patches were applied to the local server a few months a
go
> (3 or 4 months), and log shipping has been running without incident this
> whole time.
> Thanks again,
> Dave
>
> "Amer M J" wrote:
>|||Hi again,
Yes, the standby file (whatever it has been called) is automatically deleted
by the restore process.
Good news though, I seem to have log shipping going again! :-)
I re-copied the 12:15am log file (yet again, 3rd time) and restored it with
the same syntax as in my previous post, and it worked. So the problem must
lie with our link to the remote DR server. Its now been logged to the
telecom company who provide the WAN pipe.
As a precautionary measure I will schedule the remote server to be patched
to the same build level as our local server (will be next year though as we
are in a "holiday change freeze" now).
Funnily enough, the restore headeronly, verifyonly and filelistonly all
seemed to work fine with the corrupt file. Is it possible that the header o
f
the file was okay, while the actual data was bad'
Thanks very much for the help though, and I will update with anything new we
find.
Dave
"Amer M J" wrote:
> Hi Dave
> I am curious here. Was the '.tuf' file deleted as per a part of the proces
s
> of manually ?
> Also the builds do play a major role here. From what I can see as per your
> information, the primary server is of a higher build than the secondary
> server. So I was wondering how a log file of a db from a higher build was
> getting restored onto a lower build server.
> Also I would suggest checking out the integrity of the trn files on the
> primary server. try
> restore verifyonly command to check the backup set's integrity.
> Please do check if anyother process is accessing the db on the secondary
> server as this may disrupt the LS process.
> Also check this link.
> http://support.microsoft.com/kb/329487/en-us
> Thanks
> Amer M J
> MCP
Subscribe to:
Post Comments (Atom)
No comments:
Post a Comment