Restoring from Oracle RMAN backup fails about 60% of the time (noarchivelog mode)

Question

I have a strange case where restoring a recently created Oracle RMAN backup sometimes works and at other times does not.

We use Oracle 12c. The database is running in noarchivelog mode.

Full story: We have a large CI setup, where a node can pick up a job, build itself from the repo and run tests. For some of the tests a database build is required. To speed the tests up we

First attempt to restore the database from the backup.
If that fails we then drop the database rebuild the shell and the data. Re-create the backup and then start the tests.

The idea of course is that most of the time restore will work, in reality restore works about 40% of the time, and at other times it fails and the database is rebuild. There does not seem to be any correlation between this happening and individual nodes, it works one time and then does not work.

We use the following script to backup

rman target=/ << EOF
RUN {
    CONFIGURE RETENTION POLICY TO RECOVERY WINDOW OF 7 DAYS;
    SHUTDOWN IMMEDIATE;
    STARTUP MOUNT;
    BACKUP DATABASE;
}
EXIT;
EOF

and the following to restore

rman target=/ << EOF
RUN {
    SHUTDOWN IMMEDIATE;
    STARTUP MOUNT;
    RESTORE DATABASE;
    RECOVER DATABASE;
    ALTER DATABASE OPEN RESETLOGS;
}
EXIT;
EOF

When things go badly it happens on the RECOVER step.

The errors are not always the same. Below are a few that I've seen

This one seems to be the most frequent recently

04:23:10 channel ORA_DISK_1: restore complete, elapsed time: 00:00:15
04:23:10 Finished restore at 06-02-2017 04:23:10
04:23:10 
04:23:10 Starting recover at 06-02-2017 04:23:10
04:23:10 using channel ORA_DISK_1
04:23:11 
04:23:11 starting media recovery
04:23:11 
04:23:11 archived log for thread 1 with sequence 52 is already on disk as file /oracle/oradata/DB1/redoDB11.log
04:23:11 archived log for thread 1 with sequence 53 is already on disk as file /oracle/oradata/DB1/redoDB12.log
04:23:11 archived log for thread 1 with sequence 54 is already on disk as file /oracle/oradata/DB1/redoDB13.log
04:23:11 RMAN-08187: WARNING: media recovery until SCN 1662458 complete
04:23:11 Finished recover at 06-02-2017 04:23:11
04:23:11 
04:23:15 RMAN-00571: ===========================================================
04:23:15 RMAN-00569: =============== ERROR MESSAGE STACK FOLLOWS ===============
04:23:15 RMAN-00571: ===========================================================
04:23:15 RMAN-03002: failure of sql statement command at 02/06/2017 04:23:14
04:23:15 ORA-01147: SYSTEM tablespace file 1 is offline
04:23:15 ORA-01110: data file 1: '/oracle/oradata/DB1/DB1_system.dbf'
04:23:15 
04:23:15 RMAN> 
04:23:15 
04:23:15 Recovery Manager complete.

Another error

09:49:17 Finished restore at 27-01-2017 09:49:14
09:49:17 
09:49:17 Starting recover at 27-01-2017 09:49:14
09:49:17 using channel ORA_DISK_1
09:49:17 
09:49:17 starting media recovery
09:49:17 
09:49:17 archived log for thread 1 with sequence 52 is already on disk as file /oracle/oradata/DB1/redoDB11.log
09:49:17 archived log for thread 1 with sequence 53 is already on disk as file /oracle/oradata/DB1/redoDB12.log
09:49:17 archived log for thread 1 with sequence 54 is already on disk as file /oracle/oradata/DB1/redoDB13.log
09:49:17 RMAN-08187: WARNING: media recovery until SCN 1755105 complete
09:49:17 Finished recover at 27-01-2017 09:49:15
09:49:17 
09:49:17 RMAN-00571: ===========================================================
09:49:17 RMAN-00569: =============== ERROR MESSAGE STACK FOLLOWS ===============
09:49:17 RMAN-00571: ===========================================================
09:49:17 RMAN-03002: failure of sql statement command at 01/27/2017 09:49:16
09:49:17 ORA-01147: SYSTEM tablespace file 1 is offline
09:49:17 ORA-01110: data file 1: '/oracle/oradata/DB1/DB1_system.dbf'
09:49:17 
09:49:17 RMAN>

And one more

11:17:55 starting media recovery
11:17:55 media recovery failed
11:17:55 RMAN-00571: ===========================================================
11:17:55 RMAN-00569: =============== ERROR MESSAGE STACK FOLLOWS ===============
11:17:55 RMAN-00571: ===========================================================
11:17:55 RMAN-03002: failure of recover command at 01/27/2017 11:17:55
11:17:55 ORA-00283: recovery session canceled due to errors
11:17:55 RMAN-11003: failure during parse/execution of SQL statement: alter database recover if needed
11:17:55  start until cancel
11:17:55 ORA-00283: recovery session canceled due to errors
11:17:55 ORA-16433: The database or pluggable database must be opened in read/write mode.
11:17:55 
11:17:55 RMAN>

And another which is different from the above three, this one fails on RESTORE

09:14:11 Starting restore at 06-02-2017 09:14:11
09:14:11 allocated channel: ORA_DISK_1
09:14:11 channel ORA_DISK_1: SID=12 device type=DISK
09:14:12 
09:14:12 creating datafile file number=1 name=/oracle/oradata/DB1/DB1_system.dbf
09:14:12 RMAN-00571: ===========================================================
09:14:12 RMAN-00569: =============== ERROR MESSAGE STACK FOLLOWS ===============
09:14:12 RMAN-00571: ===========================================================
09:14:12 RMAN-03002: failure of restore command at 02/06/2017 09:14:12
09:14:12 ORA-01180: can not create datafile 1
09:14:12 ORA-01110: data file 1: '/oracle/oradata/DB1/DB1_system.dbf'
09:14:12 
09:14:12 RMAN>

Any help would be much appreciated. Thank you!

As you have said that your database is operating in noarchivelog mode, you don't need to recover it. — atokpas, Feb 06 '17 at 14:38
Also you need to restore spfile and control file before you restore the database. — atokpas, Feb 06 '17 at 14:57
Thank you! The reason RECOVER is there is that unless I run it and the following command, the database does not come back online. Is the way I restore the database incorrect and should be done differently for noarchivelog? — rednax, Feb 06 '17 at 15:01
Well, you still need "the following command"... just don't run the recover command as JSapkota said. When in noarchivelog mode, just run: restore database; alter database open resetlogs; — Kris Johnston, Feb 06 '17 at 17:33

score 1 · Answer 1 · answered Feb 06 '17 at 17:32

In order to understand why you are getting errors, it is important to understand what each one of those pieces mean from your script:

RESTORE DATABASE;
RECOVER DATABASE;
ALTER DATABASE OPEN RESETLOGS;

RESTORE DATABASE; copies the datafiles from your backup set back out to your database location. If the database was shutdown and consistent (was shutdown cleanly) when the backup was taken, then your backup is consistent.

RECOVER DATABASE; first attempts to make your backup consistent by applying backup archived redo logs, archived redo logs, and/or redo logs. Then, it will attempt to roll your database forward from the backup point in time until the most recent redo log. Since you are running in NOARCHIVELOG mode, this step will only produce errors like the ones you are seeing.

ALTER DATABASE OPEN RESETLOGS; brings your database on-line. RESETLOGS clears the redo logs and it is needed after an incomplete recovery (if the database couldn't roll forward and apply all of the redo from the backup archived logs, archived logs, and redo logs).

With this in mind, your script should look like the following:

RUN {
    SHUTDOWN IMMEDIATE;
    STARTUP MOUNT;
    RESTORE DATABASE;
    ALTER DATABASE OPEN RESETLOGS;
}

As a side note, you should also configure an spfile+controlfile autobackup from RMAN. This backup is automatically taken whenever you backup your database.

RMAN> CONFIGURE CONTROLFILE AUTOBACKUP ON;
RMAN> CONFIGURE CONTROLFILE AUTOBACKUP FORMAT FOR DEVICE TYPE DISK TO '%F';

As JSapkoka said, you should also restore the spfile and controlfile first as well, which, the full restore script would look something like:

RUN {
    SHUTDOWN IMMEDIATE;
    STARTUP NOMOUNT;
    SET CONTROLFILE AUTOBACKUP FORMAT FOR DEVICE TYPE DISK TO 'autobackup_format';
    RESTORE SPFILE FROM AUTOBACKUP;
    SHUTDOWN IMMEDIATE;
    STARTUP NOMOUNT;
    RESTORE CONTROLFILE FROM AUTOBACKUP;
    ALTER DATABASE MOUNT;
    RESTORE DATABASE;
    ALTER DATABASE OPEN RESETLOGS;
}

Thank you for a very detailed answer. I could not get it to work without RESTORE as the database is throwing the message below, this was why I added RESTORE in the first place: RMAN-00571: =========================================================== RMAN-00569: =============== ERROR MESSAGE STACK FOLLOWS =============== RMAN-00571: =========================================================== RMAN-03002: failure of sql statement command at 02/06/2017 14:51:09 ORA-01139: RESETLOGS option only valid after an incomplete database recovery — rednax, Feb 06 '17 at 19:52
@rednax If you didn't open your database between the backup and the restore attempt (as in, you're just testing your backups/restores), the database won't need to wipe the redo logs since it will still be considered a complete recovery. Is that what you are doing? Regardless, try: "alter database open;" instead of "alter database open resetlogs;". — Kris Johnston, Feb 06 '17 at 19:58
no luck yet. I make a few changes to the database to make sure there is something to restore before re-testing restore. Trying ALTER DATABASE OPEN RESETLOGS; gives the same error. Without RESETLOGS get RMAN-03002: failure of sql statement command at 02/06/2017 15:37:09 ORA-01190: control file or data file 1 is from before the last RESETLOGS ORA-01110: data file 1: — rednax, Feb 06 '17 at 20:39
@rednax Check your incarnations and you'll probably have to reset the incarnation: RMAN> list incarnation; RMAN> reset database to incarnation #; Reference: https://docs.oracle.com/database/121/RCMRF/rcmsynta2007.htm#RCMRF148 — Kris Johnston, Feb 06 '17 at 20:56

Restoring from Oracle RMAN backup fails about 60% of the time (noarchivelog mode)

1 Answers1