Error message stack follows

23 Troubleshooting RMAN Operations This chapter describes how to troubleshoot Recovery Manager. This chapter contains the following topics: Interpreting RMAN Message Output Re covery Manager provides detailed error messages that can aid in troubleshooting problems. Also, Oracle Database and the third-party media vendors generate useful debugging output of their own. The following discussion explains […]

Содержание

  1. 23 Troubleshooting RMAN Operations
  2. Interpreting RMAN Message Output
  3. Identifying Types of Message Output
  4. Recognizing RMAN Error Message Stacks
  5. Identifying Error Codes
  6. RMAN Error Message Numbers
  7. ORA-19511: Media Manager Errors
  8. Interpreting RMAN Error Stacks
  9. Interpreting RMAN Errors: Example
  10. Interpreting Server Errors: Example
  11. Interpreting SBT 2.0 Media Management Errors: Example
  12. Interpreting SBT 1.1 Media Management Errors: Example
  13. Identifying RMAN Return Codes
  14. Using V$ Views for RMAN Troubleshooting
  15. Monitoring RMAN Interaction with the Media Manager
  16. Correlating Server Sessions with RMAN Channels
  17. Matching Server Sessions with Channels When One RMAN Session Is Active
  18. Matching Server Sessions with Channels in Multiple RMAN Sessions
  19. Testing the Media Management API
  20. Obtaining the sbttest Utility
  21. Obtaining Online Documentation for the sbttest Utility
  22. Using the sbttest Utility
  23. Terminating an RMAN Command
  24. Terminating the Session with ALTER SYSTEM KILL SESSION
  25. Terminating the Session at the Operating System Level
  26. Terminating an RMAN Session That Is Not Responding in the Media Manager
  27. Components of an RMAN Session
  28. Process Behavior During a Suspended Job
  29. Terminating an RMAN Session: Basic Steps

23 Troubleshooting RMAN Operations

This chapter describes how to troubleshoot Recovery Manager. This chapter contains the following topics:

Interpreting RMAN Message Output

Re covery Manager provides detailed error messages that can aid in troubleshooting problems. Also, Oracle Database and the third-party media vendors generate useful debugging output of their own. The following discussion explains how to identify and interpret the different errors that you may encounter.

Identifying Types of Message Output

Output that is useful for troubleshooting failed or unresponsive RMAN jobs is located in several different places, as explained in Table 23-1.

Table 23-1 Types of Message Output

Completed job information is in V$RMAN_STATUS and RC_RMAN_STATUS . Current job information is in V$RMAN_OUTPUT .

When running RMAN from the command line, you can direct output to the following places:

A log file specified by LOG on the command line or the SPOOL LOG command

A file created by redirecting RMAN output (for example, in UNIX, using the’ > ‘ operator)

Contains actions relevant to the RMAN job and error messages generated by RMAN, the database server, and the media vendor. RMAN error messages have an RMAN- prefix. Normal action descriptions do not have a prefix.

You can execute the following PL/SQL to remove all entries from V$RMAN_STATUS :

The preceding function removes all job-related entries. No rows are visible until new backup jobs are shown in V$RMAN_BACKUP_JOB_DETAILS .

The alert subdirectory of the Automatic Diagnostic Repository (ADR) home

Contains a chronological log of errors, initialization parameter settings, and administration operations. Records values for overwritten control file records.

Oracle trace file

The trace subdirectory of the ADR home

Contains detailed output generated by Oracle Database processes. This file is created when an ORA-600 or ORA-3113 error message occurs, whenever RMAN cannot allocate a channel, and when the database fails to load the media management library.

Third-party media management software

The trace subdirectory of the ADR home

Contains vendor-specific information written by the media management software. This log does not contain Oracle Database or RMAN errors.

Media manager log file

Third-party media management software

The file names for any media manager logs other than sbtio.log are determined by the media management software.

Contains information about the functioning of the media management device

Recognizing RMAN Error Message Stacks

RMAN reports errors as they occur. If an error is not retrievable, that is, if RMAN cannot perform failover to another channel to complete a particular job step, then RMAN also reports a summary of the errors after all job sets complete. This feature is known as deferred error reporting.

One way to determine whether RMAN encountered an error is to examine its return code, as described in «Identifying RMAN Return Codes». A second way is to search the RMAN output for the string RMAN-00569 , which is the message number for the error stack banner. All RMAN errors are preceded by this error message. If you do not see an RMAN-00569 message in the output, then there are no errors. Example 23-1 shows a syntax error.

Example 23-1 RMAN Syntax Error

Identifying Error Codes

Typically, you find the following types of error codes in RMAN message stacks:

Errors prefixed with RMAN-

Errors prefixed with ORA-

Errors preceded by the line Additional information:

Oracle Database Error Messages for explanations of RMAN and ORA error codes

RMAN Error Message Numbers

Table 23-2 indicates the error ranges for common RMAN error messages, all of which are described in Oracle Database Error Messages .

Table 23-2 RMAN Error Message Ranges

Type of Output Produced By Location Description

Compilation of RESTORE or RECOVER command

Compilation of DUPLICATE command

Low-level keyword analyzer

Interphase errors between PL/SQL and RMAN

Recovery catalog packages

ORA-19511: Media Manager Errors

If a media manager error occurs, ORA-19511 is signaled, and the media manager is expected to provide RMAN a descriptive error. RMAN displays the error passed back to it by the media manager. For example, you might see this:

The message from the media manager should provide you with enough information to let you fix the root problem. If it does not, you should refer to the documentation for your media manager or contact your media management vendor support representative for further information. ORA-19511 errors originate with the media manager, not with Oracle Database. The database just passes the message on from the media manager. The cause can be addressed only by the media management vendor.

If you are still using an SBT 1.1-compliant media management layer, you may see some additional error message text. Output from an SBT 1.1-compliant media management layer is similar to the following:

The «Additional information» provided uses error codes specific to SBT 1.1. The values displayed correspond to the media manager message numbers and error text listed in Table 23-3. RMAN again signals the error, as an ORA-19511 Error received from media manager layer error, and a general error message related to the error code returned from the media manager and including the SBT 1.1 error number is then displayed.

The SBT 1.1 error messages are listed here for your reference. Table 23-3 lists media manager message numbers and their corresponding error text. In the error codes, O/S stands for operating system . The errors marked with an asterisk (*) are internal and should not typically be seen during normal operation.

Table 23-3 Media Manager Error Message Ranges

Error Range Cause

Backup file not found (only returned for read)

File exists (only returned for write)

Bad mode specified

Invalid block size specified

No tape device found

Device found, but busy; try again later

Tape volume not found

Tape volume is in-use

Can’t connect with Media Manager

O/S error for example malloc, fork error

Invalid argument(s) to sbtopen

Invalid file handle or file not open

Invalid flags to sbtclose

Invalid argument(s) to sbtclose

Can’t connect with Media Manager

Invalid file handle or file not open

End of volume reached

Invalid argument(s) to sbtwrite

Invalid file handle or file not open

End of volume reached

Invalid argument(s) to sbtread

Backup file not found

Backup file in use

Can’t connect with Media Manager

Invalid argument(s) to sbtremove

Backup file not found

Can’t connect with Media Manager

Invalid argument(s) to sbtinfo

Invalid argument(s) to sbtinit

Interpreting RMAN Error Stacks

Sometimes you may find it difficult to identify the useful messages in the RMAN error stack. Note the following tips and suggestions:

Read the messages from the bottom up, because this is the order in which RMAN issues the messages. The last one or two errors displayed in the stack are often the most informative.

When you are using an SBT 1.1 media management layer and you are presented with SBT 1.1 style error messages containing the » Additional information: » numeric error codes, look for the ORA-19511 message that follows for the text of error messages passed back to RMAN by the media manager. These should identify the real failure in the media management layer.

Look for the RMAN-03002 or RMAN-03009 message ( RMAN-03009 is the same as RMAN-03002 but includes the channel ID), immediately following the error banner. These messages indicate which command failed. Syntax errors generate RMAN-00558 .

Identify the basic type of error according to the error range chart in Table 23-2 and then refer to Oracle Database Error Messages for information about the most important messages.

Interpreting RMAN Errors: Example

You attempt a backup of tablespace users and receive the following message:

The RMAN-03002 error indicates that the BACKUP command failed. You read the last two messages in the stack first and immediately see the problem: no tablespace user appears in the recovery catalog because you mistyped the name.

Interpreting Server Errors: Example

Assume that you attempt to recover a tablespace and receive the following errors:

As suggested, you start reading from the bottom up. The ORA-01110 message explains there was a problem with the recovery of data file users01.dbf . The second error indicates that the database cannot recover the data file because it is in use or already being recovered. The remaining RMAN errors indicate that the recovery session was canceled due to the server errors. Hence, you conclude that because you were not already recovering this data file, the problem must be that the data file is online and you must take it offline and restore a backup.

Interpreting SBT 2.0 Media Management Errors: Example

Assume that you use a tape drive and see the following output during a backup job:

The error text displayed following the ORA-19511 error is generated by the media manager and describes the real source of the failure. See the media manager documentation to interpret this error.

Interpreting SBT 1.1 Media Management Errors: Example

Assume that you use a tape drive and see the following output during a backup job:

The main information of interest returned by SBT 1.1 media managers is the error code in the «Additional information» line:

Referring to Table 23-3, you discover that error 7005 means that the media management device is busy. So, the media management software is not able to write to the device because it is in use or there is a problem with it.

The sbtio.log contains information written by the media management software, not Oracle Database. Thus, you must consult your media vendor documentation to interpret the error codes and messages. If no information is written to the sbtio.log , then contact your media manager support to ask whether they are writing error messages in some other location, or whether there are steps you must take to have the media manager errors appear in sbtio.log .

Identifying RMAN Return Codes

One way to determine whether RMAN encountered an error is to examine its return code or exit status. The RMAN client returns 0 to the shell from which it was invoked if no errors occurred, and a nonzero error value otherwise.

How you access this return code depends upon the environment from which you invoked the RMAN client. For example, if you run UNIX with the C shell, then, when RMAN completes, the return code is placed in a shell variable called $status . The method of returning exit status is a detail specific to the host operating system rather than the RMAN client.

Using V$ Views for RMAN Troubleshooting

When LIST , REPORT , and SHOW do not provide all the information that you need for RMAN operations, some V$ views can provide more details.

Sometimes it is useful to identify exactly what a server session performing a backup and recovery job is doing. The views described in Table 23-4 are useful for obtaining information about RMAN jobs.

Table 23-4 Useful V$ Views for Troubleshooting

Cause No. Message

Identifies currently active processes

Identifies currently active sessions. Use this view to determine which database server sessions correspond to which RMAN allocated channels.

Lists the events or resources for which sessions are waiting

You can use the preceding views to perform the following tasks:

Monitoring RMAN Interaction with the Media Manager

You can use the event names in the dynamic performance event views to monitor RMAN calls to the media management API. The event names have one-to-one correspondence with SBT functions, as shown in the following examples:

To obtain the complete list of SBT events, you can use the following query:

Before making a call to any of functions in the media management API, the server adds a row in V$SESSION_WAIT , with the STATE column including the string WAITING . The V$SESSION_WAIT.SECONDS_IN_WAIT column shows the number of seconds that the server has been waiting for this call to return. After an SBT function is returned from the media manager, this row disappears.

A row in V$SESSION_WAIT corresponding to an SBT event name does not indicate a problem, because the server updates these rows at run time. The rows appear and disappear as calls are made and returned. However, if the SECONDS_IN_WAIT column is high, then the media manager may be suspended.

To monitor the SBT events, you can run the following SQL query:

Examine the SQL output to determine which SBT functions are waiting. For example, the following output indicates that RMAN has been waiting for the sbtbackup function to return for 10 minutes:

The V$SESSION_WAIT view shows only database events, not media manager events.

Oracle Database Reference for descriptions of the V$SESSION_WAIT view.

Correlating Server Sessions with RMAN Channels

To identify which server sessions correspond to which RMAN channels, you can query V$SESSION and V$PROCESS . The SPID column of V$PROCESS identifies the operating system ID number for the process or thread. For example, on UNIX the SPID column shows the process ID, whereas on Windows the SPID column shows the thread ID. You have two basic methods for obtaining this information, depending on whether you have multiple RMAN sessions active concurrently.

Matching Server Sessions with Channels When One RMAN Session Is Active

When only one RMAN session is active, the easiest method for determining the server session ID for an RMAN channel is to execute the following query on the target database while the RMAN job is executing:

The following shows sample output:

If you set an ID using the RMAN SET COMMAND ID command instead of using the system-generated default ID, then search for that value in the CLIENT_INFO column instead of ‘rman%’ .

Matching Server Sessions with Channels in Multiple RMAN Sessions

If more than one RMAN session is active, then it is possible for the V$SESSION.CLIENT_INFO column to yield the same information for a channel in each session. For example:

In this case, you have the following methods for determining which channel corresponds to which SID value.

Obtaining the Channel ID from the RMAN Output

In this method, you must first obtain the sid values from the RMAN output and then use these values in your SQL query.

To correlate a process with a channel during a backup:

In an active session, run the RMAN job as usual and examine the output to get the sid for the channel. For example, the output may show:

Start a SQL*Plus session and then query the joined V$SESSION and V$PROCESS views while the RMAN job is executing . For example, enter:

Use the sid value obtained from the first step to determine which channel corresponds to which server session:

Correlating Server Sessions with Channels by Using SET COMMAND ID

In this method, you specify a command ID string in the RMAN backup script. You can then query V$SESSION.CLIENT_INFO for this string.

To correlate a process with a channel during a backup:

In each session, set the COMMAND ID to a different value after allocating the channels and then back up the desired object. For example, enter the following in session 1:

Set the command ID to a string such as sess2 in the job running in session 2:

Start a SQL*Plus session and then query the joined V$SESSION and V$PROCESS views while the RMAN job is executing . For example, enter:

If you run the SET COMMAND ID command in the RMAN job, then the CLIENT_INFO column displays in the following format:

For example, the following shows sample output:

The rows that contain the string rman channel show the channel performing the backup. The remaining rows are for the connections to the target database.

Oracle Database Backup and Recovery Reference for SET COMMAND ID syntax, and Oracle Database Reference for more information about V$SESSION and V$PROCESS

On some platforms, Oracle provides a diagnostic tool called sbttest . This utility performs a simple test of the media management software by acting as the Oracle database server and attempting to communicate with the media manager.

Obtaining the sbttest Utility

On UNIX, the sbttest utility is typically located in $ORACLE_HOME/bin . If for some reason the utility is not included with your platform, then contact Oracle Support Services to obtain the C version of the program. You can compile this version of the program on all UNIX platforms.

On platforms such as Solaris, you do not have to relink when using sbttest . On other platforms, relinking may be necessary.

Obtaining Online Documentation for the sbttest Utility

For online documentation of sbttest , issue the following on the command line:

The program displays the list of possible arguments for the program:

The display also indicates the meaning of each argument. For example, following is the description for two optional parameters:

Using the sbttest Utility

Use sbttest to perform a quick test of the media manager.

If sbttest returns 0, then the test ran without error, which means that the media manager is correctly installed and can accept a data stream and return the same data when requested. If sbttest returns a nonzero value, then either the media manager is not installed or it is not configured correctly.

Confirm that the program is installed and included in the system path by typing sbttest at the command line:

If the program is operational, then you should see a display of the online documentation.

Execute the program, specifying any of the arguments described in the online documentation. For example, enter the following to create test file some_file.f and write the output to sbtio.log :

You can also test a backup of an existing data file. For example, this command tests data file tbs_33.f of database prod :

Examine the output. If the program encounters an error, then it provides messages describing the failure. For example, if the database cannot find the library, you see:

In some cases, sbttest can work but an RMAN backup does not. The reasons can be the following:

The user who starts sbttest is not the owner of the Oracle Database processes.

If the database server is not linked with the media management library or cannot load it dynamically when needed, then RMAN backups to the media manager fail, but sbttest may still work.

The sbttest program passes all environment parameters from the shell but RMAN does not.

Terminating an RMAN Command

There are several ways to terminate an RMAN command in the middle of execution:

The preferred method is to press Control+C (or the equivalent «attention» key combination for your system) in the RMAN interface. This also terminates allocated channels, unless they are suspended in the media management code, as happens when, for example, they are waiting for a tape to be mounted.

You can end the server session corresponding to the RMAN channel by running the SQL ALTER SYSTEM KILL SESSION statement.

You can terminate the server session corresponding to the RMAN channel on the operating system.

Terminating the Session with ALTER SYSTEM KILL SESSION

You can identify the Oracle session ID for an RMAN channel by looking in the RMAN log for messages with the format shown in the following example:

The sid and devtype are displayed for each allocated channel. The Oracle Database sid is different from the operating system process ID. You can end the session using a SQL ALTER SYSTEM KILL SESSION statement.

ALTER SYSTEM KILL SESSION takes two arguments, the sid printed in the RMAN message and a serial number, both of which can be obtained by querying V$SESSION . For example, run the following statement, where sid_in_rman_output is the number from the RMAN message:

Then, run the following statement, substituting the sid_in_rman_output and serial number obtained from the query:

This statement has no effect on the session if the session stopped in media manager code.

Terminating the Session at the Operating System Level

Finding and terminating the processes that are associated with the server sessions is operating system-specific. On some platforms, the server sessions are not associated with any processes at all. See your operating system-specific documentation for more information.

Terminating an RMAN Session That Is Not Responding in the Media Manager

You may sometimes need to terminate an RMAN job that is not responding in the media manager. The best way to terminate RMAN when the channel connections are not responding in the media manager is to terminate the session in the media manager. If this action does not solve the problem, then on some platforms, such as Linux, you may be able to terminate the Oracle Database processes of the connections. (Terminating the Oracle processes may cause problems with the media manager. See your media manager documentation for details.)

Components of an RMAN Session

The nature of an RMAN session depends on the operating system. In UNIX, an RMAN session has the following processes associated with it:

The RMAN client process itself

The default channel , the initial connection to the target database

One target connection to the target database corresponding to each allocated channel

The catalog connection to the recovery catalog database, if you use a recovery catalog

An auxiliary connection to an auxiliary instance, during DUPLICATE or TSPITR operations

A polling connection to the target database, used for monitoring RMAN command execution on the various allocated channels. By default, RMAN makes one polling connection. RMAN makes additional polling connections if you use different connect strings in the ALLOCATE CHANNEL or CONFIGURE CHANNEL commands. One polling connection exists for each distinct connect string used in the ALLOCATE CHANNEL or CONFIGURE CHANNEL command.

Process Behavior During a Suspended Job

RMAN usually stops responding because a channel connection is waiting in the media manager code for a tape resource. The catalog connection and the default channel appear to suspend, because they are waiting for RMAN to tell them what to do. Polling connections seem to be in an infinite loop while polling the RPC under the control of the RMAN process.

If you terminate the RMAN process itself, then you also terminate the catalog connection, the auxiliary connection, the default channel, and the polling connections. If target and auxiliary connections are suspended but not while executing media manager code, they also terminate. If either the target connection or any of the auxiliary connections are executing in the media management layer, then they do not terminate until the processes are manually terminated at the operating system level.

Not all media managers can detect the termination of the Oracle Database process. Those which cannot may keep resources busy or continue processing. Consult your media manager documentation for details.

Terminating the catalog connection does not cause the RMAN process to terminate because RMAN is not performing catalog operations while the backup or restore is in progress. Removing default channel and polling connections causes the RMAN process to detect that one channel is no longer present and then exits. In this case, the connections to the unresponsive channels remain active as described previously.

Terminating an RMAN Session: Basic Steps

After the unresponsive channels in the media manager code are terminated, the RMAN process detects this termination and exits, removing all connections except target connections that are still operative in the media management layer. The warning about the media manager resources still applies in this case.

To terminate an Oracle Database process that is not responding in the media manager:

Query V$SESSION and V$SESSION_WAIT as described in «Using V$ Views for RMAN Troubleshooting». For example, execute the following query:

Examine the SQL output to determine which SBT functions are waiting. For example, the output may be as follows:

Using operating system-level tools appropriate to your platform, end the unresponsive sessions. For example, on Linux execute a kill -9 command:

Some platforms include a command-line utility called orakill that enables you to terminate a specific thread. From a command prompt, run the following command, where sid identifies the database instance to target, and the thread_id is the SPID value from the query in Step 1:

Check that the media manager also clears its processes. If any remain, the next backup or restore operation may freeze again, due to the previous problems in the backup or restore operation. In some media managers, the only solution is to shut down and restart the media manager. If the documentation from the media manager does not provide the needed information, contact technical support for the media manager.

Your operating system-specific documentation for the relevant commands

Источник

Adblock
detector

View Description

3-11. Старт и остановка базы данных в RMAN

Во время выполнения задач резервного копирования и восстановления можно загружать и выгружать базу данных Oracle не выходя из клиента RMAN. Для выполнения этих действий используются аналоги команд SQL*Plus STARTUP и SHUTDOWN.

Старт базы данных

Для старта базы данных используется команда RMAN startup. Так же как и её аналог из SQL*Plus она может использоваться с различными опциями. Если опции не указываются, то команда осуществляет автоматическое монтирование и открытие базы данных:

RMAN> startup;
connected to target database (not started)
Oracle instance started
database mounted
database opened
Total System Global Area     285212672 bytes
Fixed Size                     1261372 bytes
Variable Size                268435652 bytes
Database Buffers              12582912 bytes
Redo Buffers                   2932736 bytes

Использование опций в команде STARTUP из RMAN клиента позволяет сделать процесс восстановления базы данных гибче. В следующем примере можно наблюдать, как происходит восстановление в RMAN по шагам, начиная от старта базы данных в NOMOUNT режиме и заканчивая открытием базы данных с опцией RESETLOGS:

[oracle@alfa ~]$ rman target /
Recovery Manager: Release 10.2.0.3.0 - Production on Tue Nov 8 14:59:16 2011
Copyright (c) 1982, 2005, Oracle.  All rights reserved.
connected to target database (not started) 
RMAN> startup nomount;
Oracle instance started
Total System Global Area     285212672 bytes
Fixed Size                     1261372 bytes
Variable Size                268435652 bytes
Database Buffers              12582912 bytes
Redo Buffers                   2932736 bytes
RMAN> restore controlfile from autobackup;
Starting restore at 08-NOV-11
allocated channel: ORA_DISK_1
channel ORA_DISK_1: sid=156 devtype=DISK
allocated channel: ORA_SBT_TAPE_1
channel ORA_SBT_TAPE_1: sid=155 devtype=SBT_TAPE
channel ORA_SBT_TAPE_1: WARNING: Oracle Test Disk API
channel ORA_SBT_TAPE_1: looking for autobackup on day: 20111108
channel ORA_SBT_TAPE_1: looking for autobackup on day: 20111107
channel ORA_SBT_TAPE_1: looking for autobackup on day: 20111106
channel ORA_SBT_TAPE_1: looking for autobackup on day: 20111105
channel ORA_SBT_TAPE_1: looking for autobackup on day: 20111104
channel ORA_SBT_TAPE_1: looking for autobackup on day: 20111103
channel ORA_SBT_TAPE_1: looking for autobackup on day: 20111102
channel ORA_SBT_TAPE_1: no autobackup in 7 days found
recovery area destination: /u01/app/oracle/flash_recovery_area/
database name (or database unique name) used for search: ORCL
channel ORA_DISK_1: autobackup found in the recovery area
channel ORA_DISK_1: autobackup found: 
/u01/app/oracle/flash_recovery_area/ORCL/autobackup/2011_11_08/o1_mf_s_766683874_7cl91lo5
_.bkp
channel ORA_DISK_1: control file restore from autobackup complete
output filename=/u02/oradata/orcl/control01.ctl
output filename=/u02/oradata/orcl/control02.ctl
output filename=/u02/oradata/orcl/control03.ctl
Finished restore at 08-NOV-11
RMAN> alter database mount;
database mounted
released channel: ORA_DISK_1
released channel: ORA_SBT_TAPE_1
RMAN> recover database;
Starting recover at 08-NOV-11
Starting implicit crosscheck backup at 08-NOV-11
allocated channel: ORA_DISK_1
channel ORA_DISK_1: sid=155 devtype=DISK
Crosschecked 3 objects
Finished implicit crosscheck backup at 08-NOV-11
Starting implicit crosscheck copy at 08-NOV-11
using channel ORA_DISK_1
Finished implicit crosscheck copy at 08-NOV-11
searching for all files in the recovery area
cataloging files...
cataloging done
List of Cataloged Files
=======================
File Name: 
/u01/app/oracle/flash_recovery_area/ORCL/autobackup/2011_11_08/o1_mf_s_766683874_7cl91lo5
_.bkp
using channel ORA_DISK_1
allocated channel: ORA_SBT_TAPE_1
channel ORA_SBT_TAPE_1: sid=156 devtype=SBT_TAPE
channel ORA_SBT_TAPE_1: WARNING: Oracle Test Disk API
starting media recovery
archive log thread 1 sequence 45 is already on disk as file 
/u01/app/oracle/flash_recovery_area/ORCL/archivelog/2011_08_24/o1_mf_1_45_759tw74z_.arc
archive log thread 1 sequence 46 is already on disk as file 
/u01/app/oracle/flash_recovery_area/ORCL/archivelog/2011_08_24/o1_mf_1_46_759tw8sj_.arc
archive log thread 1 sequence 47 is already on disk as file 
/u01/app/oracle/flash_recovery_area/ORCL/archivelog/2011_08_24/o1_mf_1_47_759twjfk_.arc
archive log thread 1 sequence 48 is already on disk as file 
/u01/app/oracle/flash_recovery_area/ORCL/archivelog/2011_08_24/o1_mf_1_48_759twkrz_.arc
archive log thread 1 sequence 49 is already on disk as file /u02/oradata/orcl/redo03.log
archive log 
filename=/u01/app/oracle/flash_recovery_area/ORCL/archivelog/2011_08_24/o1_mf_1_45_759tw7
4z_.arc thread=1 sequence=45
archive log 
filename=/u01/app/oracle/flash_recovery_area/ORCL/archivelog/2011_08_24/o1_mf_1_46_759tw8
sj_.arc thread=1 sequence=46
archive log 
filename=/u01/app/oracle/flash_recovery_area/ORCL/archivelog/2011_08_24/o1_mf_1_47_759twj
fk_.arc thread=1 sequence=47
archive log 
filename=/u01/app/oracle/flash_recovery_area/ORCL/archivelog/2011_08_24/o1_mf_1_48_759twk
rz_.arc thread=1 sequence=48
archive log filename=/u02/oradata/orcl/redo03.log thread=1 sequence=49
media recovery complete, elapsed time: 00:00:02
Finished recover at 08-NOV-11
RMAN> alter database open resetlogs;
database opened

Для того чтобы вышеуказанный пример получился, резервная копия, с которой происходит восстановление контрольного файла, должна быть создана с предварительно включенным параметром CONTROLFILE AUTOBACKUP:

RMAN> CONFIGURE CONTROLFILE AUTOBACKUP ON;
old RMAN configuration parameters:
CONFIGURE CONTROLFILE AUTOBACKUP OFF;
new RMAN configuration parameters:
CONFIGURE CONTROLFILE AUTOBACKUP ON;
new RMAN configuration parameters are successfully stored

Опцию NOMOUNT команды STARTUP можно так же использовать, если серверный (или инициализационный) файл параметров потерян и требуется его восстановление из резервной копии. Следующий пример демонстрирует этот процесс по шагам.

Для начала, переименуем файл параметров, имитируя его потерю:

[oracle@alfa /]$ cd $ORACLE_HOME/dbs
[oracle@alfa dbs]$ ls spfileorcl.ora
spfileorcl.ora
[oracle@alfa dbs]$ mv spfileorcl.ora spfileorcl.old

Если сейчас попробовать стартовать экземпляр, то он запросит файл параметров:

SQL> startup;
ORA-01078: failure in processing system parameters
LRM-00109: could not open parameter file 
'/u01/app/oracle/product/10.2.0/db_1/dbs/initorcl.ora'

Утерянный файл параметров имеется в резервной копии RMAN, но для того чтобы восстановить его, надо запустить базу данных в NOMOUNT режиме. Поэтому, используем для старта базы данных фиктивный файл параметров.

Перед стартом обязательно устанавливаем идентификатор базы данных, так как мы не подключаемся к каталогу восстановления:

RMAN> set DBID 1265664822;
executing command: SET DBID

Стартуем экземпляр с фиктивным файлом параметров:

RMAN> startup force nomount;
startup failed: ORA-01078: failure in processing system parameters
LRM-00109: could not open parameter file 
'/u01/app/oracle/product/10.2.0/db_1/dbs/initorcl.ora'
starting Oracle instance without parameter file for retrival of spfile
Oracle instance started
Total System Global Area     159383552 bytes
Fixed Size                     1260648 bytes
Variable Size                 58721176 bytes
Database Buffers              96468992 bytes
Redo Buffers                   2932736 bytes

Попробуем восстановить файл параметров из резервной копии:

RMAN> restore spfile from autobackup;
Starting restore at 11-JAN-12
using target database control file instead of recovery catalog
allocated channel: ORA_DISK_1
channel ORA_DISK_1: sid=36 devtype=DISK
channel ORA_DISK_1: looking for autobackup on day: 20120111
channel ORA_DISK_1: looking for autobackup on day: 20120110
channel ORA_DISK_1: looking for autobackup on day: 20120109
channel ORA_DISK_1: looking for autobackup on day: 20120108
channel ORA_DISK_1: looking for autobackup on day: 20120107
channel ORA_DISK_1: looking for autobackup on day: 20120106
channel ORA_DISK_1: looking for autobackup on day: 20120105
channel ORA_DISK_1: no autobackup in 7 days found
RMAN-00571: ===========================================================
RMAN-00569: =============== ERROR MESSAGE STACK FOLLOWS ===============
RMAN-00571: ===========================================================
RMAN-03002: failure of restore command at 01/11/2012 13:39:14
RMAN-06172: no autobackup found or specified handle is not a valid copy or piece

Восстановление закончилось неудачей. Почему? Чтобы ответить на этот вопрос, вспомним, что экземпляр базы данных запускался с фиктивным файлом параметров. В нём нет указаний на путь к директории флэш-области восстановления, которая содержит резервные копии. RMAN просто не может их найти. Значит, придётся задавать расположение флэш-области вручную. В нашем случае путь к директории флэш-области складывается из значений двух параметров db_recovery_file_dest и db_name, которые можно определить прямо в команде восстановления файла параметров:

RMAN> restore spfile from autobackup 
db_recovery_file_dest='/u01/app/oracle/flash_recovery_area' db_name='orcl';
Starting restore at 11-JAN-12
using channel ORA_DISK_1
recovery area destination: /u01/app/oracle/flash_recovery_area
database name (or database unique name) used for search: ORCL
channel ORA_DISK_1: autobackup found in the recovery area
channel ORA_DISK_1: autobackup found: 
/u01/app/oracle/flash_recovery_area/ORCL/autobackup/2012_01_10/o1_mf_s_772211704_7jrby9p9
_.bkp
channel ORA_DISK_1: SPFILE restore from autobackup complete
Finished restore at 11-JAN-12

Как видно RMAN нашел резервную копию и восстановил из неё файл параметров. Проверим наличие файла в операционной системе:

[oracle@alfa dbs]$ ls spfileorcl.ora
spfileorcl.ora

Файл параметров действительно присутствует. Теперь можно стартовать базу данных:

RMAN> startup force;
Oracle instance started
database mounted
database opened
Total System Global Area     285212672 bytes
Fixed Size                     1261372 bytes
Variable Size                268435652 bytes
Database Buffers              12582912 bytes
Redo Buffers                   2932736 bytes

Файл параметров восстановлен, и база данных открыта. Причём сделано это всё было только в клиенте RMAN.

Остановка базы данных

Для закрытия базы данных и остановки экземпляра, в клиенте RMAN можно использовать команду SHUTDOWN. Команда поддерживает в синтаксисе те же самые опции, что и её аналог в SQL*Plus. В следующем примере показано, как используя только RMAN клиент, можно остановить базу данных, стартовать её в смонтированном режиме, сделать резервную копию и открыть базу данных для использования:

RMAN> shutdown immediate;
database closed
database dismounted
Oracle instance shut down
RMAN> startup mount;
connected to target database (not started)
Oracle instance started
database mounted
Total System Global Area     285212672 bytes
Fixed Size                     1261372 bytes
Variable Size                264241348 bytes
Database Buffers              16777216 bytes
Redo Buffers                   2932736 bytes
RMAN> backup database;
Starting backup at 11-JAN-12
allocated channel: ORA_DISK_1
channel ORA_DISK_1: sid=157 devtype=DISK
channel ORA_DISK_1: starting full datafile backupset
channel ORA_DISK_1: specifying datafile(s) in backupset
input datafile fno=00001 name=/u02/oradata/orcl/system01.dbf
input datafile fno=00003 name=/u02/oradata/orcl/sysaux01.dbf
input datafile fno=00002 name=/u02/oradata/orcl/undotbs01.dbf
input datafile fno=00005 name=/u02/oradata/orcl/example01.dbf
input datafile fno=00004 name=/u02/oradata/orcl/users01.dbf
channel ORA_DISK_1: starting piece 1 at 11-JAN-12
channel ORA_DISK_1: finished piece 1 at 11-JAN-12
piece 
handle=/u01/app/oracle/flash_recovery_area/ORCL/backupset/2012_01_11/o1_mf_nnndf_TAG20120
111T142105_7jtw51on_.bkp tag=TAG20120111T142105 comment=NONE
channel ORA_DISK_1: backup set complete, elapsed time: 00:02:14
Finished backup at 11-JAN-12
Starting Control File and SPFILE Autobackup at 11-JAN-12
piece 
handle=/u01/app/oracle/flash_recovery_area/ORCL/autobackup/2012_01_11/o1_mf_s_772294711_7
jtw98ss_.bkp comment=NONE
Finished Control File and SPFILE Autobackup at 11-JAN-12 
RMAN> alter database open;
database opened

Стоит отметить, что команды STARTUP и SHUTDOWN в клиенте RMAN имеют ограничения. Так, с их помощью нельзя закрыть и стартовать экземпляр каталога восстановления. Данные команды относятся только к целевой базе данных. Поэтому, если всё же потребуется остановить или стартовать экземпляр каталога из клиента RMAN, необходимо подключиться к нему как к целевой базе данных, и только затем использовать команды STARTUP и SHUTDOWN.

3-12. Проверка синтаксиса RMAN команд

В RMAN имеется возможность проверки синтаксиса вводимых команд. Для входа в этот режим необходимо запустить клиента RMAN с аргументом командной строки checksyntax. После этого, можно проверять синтаксис команд, вводя и запуская их в оболочке клиента или подгружая файл скрипта с содержащимися в нём командами. Сами команды исполняться при этом не будут.

Следующий пример демонстрирует проверку синтаксиса единственной команды:

[oracle@alfa ~]$ rman checksyntax
Recovery Manager: Release 10.2.0.3.0 - Production on Tue Jan 24 18:33:39 2012
Copyright (c) 1982, 2005, Oracle.  All rights reserved.
RMAN> run {backup database;}
The command has no syntax errors

Как видно из примера, ошибок в синтаксисе этой команды не обнаружено.

Кроме проверки синтаксиса отдельных команд, в RMAN можно проверить синтаксис команд целого скрипта. Для этого надо указать файл скрипта в качестве аргумента командной строки при запуске клиента:

[oracle@alfa ~]$ rman checksyntax @backup1.rman
Recovery Manager: Release 10.2.0.3.0 - Production on Tue Jan 24 18:43:42 2012
Copyright (c) 1982, 2005, Oracle.  All rights reserved.
RMAN> run
2> {
3> allocate channel dev1 device type sbt;
4> set backup copies = 2;
5> backup datafiles 
RMAN-00571: ===========================================================
RMAN-00569: =============== ERROR MESSAGE STACK FOLLOWS ===============
RMAN-00571: ===========================================================
RMAN-00558: error encountered while parsing input commands
RMAN-01009: syntax error: found "identifier": expecting one of: "archivelog, as, backup, 
backupset, blocks, channel, check, copy, copies, controlfilecopy, cumulative, current, 
database, datafile, datafilecopy, device, diskratio, db_recovery_file_dest, 
db_file_name_convert, duration, filesperset, for, format, full, force, incremental, keep, 
(, maxsetsize, nochecksum, noexclude, nokeep, not, proxy, pool, reuse, recovery, skip, 
spfile, setsize, tablespace, tag, to, validate"
RMAN-01008: the bad identifier was: datafiles
RMAN-01007: at line 5 column 8 file: backup1.rman

В примере мы видим, что в синтаксисе третьей команды скрипта имеется ошибка.

Использование аргумента checksyntax позволяет предотвратить появление грубых ошибок во время выполнения командного скрипта RMAN. Рекомендуется всегда использовать данную проверку, когда скрипт модифицируется.

3-13. Сокрытие паролей при подключении к RMAN

Запуск клиента RMAN часто сопровождается указанием в аргументах командной строки параметров подключения, таких как база данных, имя и пароль пользователя. В результате, в различные логи операционной системы могут попадать данные, которые содержат командную строку запуска RMAN с паролём пользователя в открытом виде. Существует два способа избежать этого. Первый – это указывать в командной строке только имя пользователя. Клиент RMAN сам запросит пароль. Вводимые символы при этом отображаться не будут:

[oracle@alfa ~]$ rman target Адрес электронной почты защищен от спам-ботов. Для просмотра адреса в вашем браузере должен быть включен Javascript.
Recovery Manager: Release 10.2.0.3.0 - Production on Tue Jan 24 18:50:07 2012
Copyright (c) 1982, 2005, Oracle.  All rights reserved.
target database Password: 
connected to target database: ORCL (DBID=1265664822)

Если же требуется выполнить командный скрипт RMAN и при этом необходимо избежать указывания пароля в командной строке, то можно поместить команду подключения CONNECT с именем и паролём пользователя в самое начало файла скрипта. При выполнении такого скрипта RMAN клиент сделает подключение, заменив в выводе имя пользователя и пароль звёздочками:

[oracle@alfa ~]$ rman @backup.rman;
Recovery Manager: Release 10.2.0.3.0 - Production on Tue Jan 24 14:04:56 2012
Copyright (c) 1982, 2005, Oracle.  All rights reserved.
RMAN> connect target *
2> backup database;
connected to target database: ORCL (DBID=1265664822)
Starting backup at 24-JAN-12
…

В последнем случае стоит ограничить доступ другим пользователям к файлу сценария на уровне операционной системы, так как в нём будет указаны параметры подключения в открытом виде.

3-14. Идентификация серверных сеансов RMAN

Для выполнения задач резервного копирования и восстановления, RMAN образует с целевой базой данных или каталогом восстановления сеансы. Иногда возникают случаи, когда необходимо прервать работу одного из таких сеансов. Ниже на примерах будет показано, как можно идентифицировать такие сеансы в системе, для того чтобы их в дальнейшем можно было уничтожить.

Определение колличества сеансов

Узнать количество сеансов RMAN можно с помощью следующей формулы:

Количество сеансов = C+N+2

где:

  • C – количество выделенных каналов;
  • N – число опций CONNECT, используемых в командах выделения каналов (если не определено, то равно 1);

Если используется каталог восстановления, всегда имеется, по крайней мере, два сеанса, один к каталогу восстановления, другой по умолчанию к целевой базе данных. Сеанс по умолчанию необходим для того, чтобы выполнить такие задачи, как применение архивных журналов во время процесса восстановления.

Идентификация сеансов

Если RMAN клиент запускается на сервере базы данных (unix), то для того чтобы отобразить список серверных процессов относящихся к сеансам RMAN достаточно выполнить следующую команду:

[oracle@alfa ~]$ ps -ef | grep rman
oracle    3743  2887  4 10:20 pts/1    00:00:00 rman target /

Из примера видно, что был запущен клиент RMAN c идентификатором процесса PID равным 3743. Ищем процессы имеющие идентификаторы родительского процесса PPID равным этому значению:

[oracle@alfa ~]$ ps -ef | grep 3743
oracle    3748  3743  0 10:20 ?        00:00:00 oracleorcl 
(DESCRIPTION=(LOCAL=YES)(ADDRESS=(PROTOCOL=beq)))
oracle    3749  3743  0 10:20 ?        00:00:00 oracleorcl 
(DESCRIPTION=(LOCAL=YES)(ADDRESS=(PROTOCOL=beq)))

Найденные процессы и будут принадлежать сеансам запущенного клиента RMAN. По их PID можно легко найти идентификаторы (SID, SERIAL#) сеансов и уничтожить их с помощью команды ALTER SYSTEM … KILL SESSION.

Идентификатор сеанса SID так же можно встретить и в выходной информации RMAN:

RMAN> backup database;
Starting backup at 25-JAN-12
using target database control file instead of recovery catalog
allocated channel: ORA_DISK_1
channel ORA_DISK_1: sid=145 devtype=DISK

В этом примере мы видим, что для резервного копирования базы данных был применён канал ORA_DISK_1, который использует сеанс с идентификатором SID = 145.

Если клиент RMAN подключается к базе данных удалённо, то определить такие сеансы через команды операционной системы будет затруднительно. В этом случае придётся обращаться к системному представлению V$SESSION и идентифицировать сеансы либо по названию программы клиента, либо по терминалу. В остальном сеансы RMAN не несут никаких отличительных признаков по сравнению с другими сеансами:

SQL> SELECT sid, serial#, username, terminal FROM v$session WHERE program = 
'rman.exe'
 
SID SERIAL# USERNAME TERMINAL
--- ------- -------- --------
146 31      SYS      0500-ASU
158 57      SYS      0500-ASU
 
Выбрано: 2 строки

3-15. Удаление базы данных из клиента RMAN

Если требуется удалить базу данных, то сделать это можно с помощью утилиты SQL*Plus. Но если нет возможности в её использовании, то удаление можно провести и в RMAN клиенте.

Для удаления базы данных, надо:

  1. Запустить базу данных в смонтированном эксклюзивном режиме:

    SQL> startup restrict mount exclusive;
    ORACLE instance started.
    Total System Global Area  285212672 bytes
    Fixed Size                  1261372 bytes
    Variable Size             268435652 bytes
    Database Buffers           12582912 bytes
    Redo Buffers                2932736 bytes
    Database mounted.
    SQL> exit 
    
  2. Использовать команду DROP для удаления базы:

    RMAN> drop database;
    database name is "ORCL" and DBID is 1290912723
    
  3. RMAN потребует подтверждение на удаление. Необходимо ввести YES:

    Do you really want to drop the database (enter YES or NO)? YES
    database dropped
    

Можно использовать ключевое слово noprompt в команде DROP для предотвращения диалога запроса на удаление. Это может пригодиться в случае составления командного скрипта RMAN.

Стандартная команда DROP DATABASE удаляет только файлы данных, файлы онлайн журналов, и контрольные файлы. Но в RMAN эта команда имеет расширенный синтаксис. Используя дополнительные ключевые слова, можно с помощью этой команды дополнительно избавиться и от всех резервных копий сделанных ранее RMAN:

RMAN> drop database including backups;
database name is "ORCL" and DBID is 1301354837
Do you really want to drop all backups and the database (enter YES or NO)? YES
using target database control file instead of recovery catalog
allocated channel: ORA_DISK_1
channel ORA_DISK_1: sid=156 devtype=DISK
List of Backup Pieces
BP Key  BS Key  Pc# Cp# Status      Device Type Piece Name
------- ------- --- --- ----------- ----------- ----------
1       1       1   1   AVAILABLE   DISK        
/u01/app/oracle/flash_recovery_area/ORCL/backupset/2012_01_26/o1_mf_nnndf_TAG20120126T094
438_7l1xlq7k_.bkp
2       2       1   1   AVAILABLE   DISK        
/u01/app/oracle/flash_recovery_area/ORCL/backupset/2012_01_26/o1_mf_ncnnf_TAG20120126T094
438_7l1xnkg7_.bkp
deleted backup piece
backup piece 
handle=/u01/app/oracle/flash_recovery_area/ORCL/backupset/2012_01_26/o1_mf_nnndf_TAG20120
126T094438_7l1xlq7k_.bkp recid=1 stamp=773574279
deleted backup piece
backup piece 
handle=/u01/app/oracle/flash_recovery_area/ORCL/backupset/2012_01_26/o1_mf_ncnnf_TAG20120
126T094438_7l1xnkg7_.bkp recid=2 stamp=773574337
Deleted 2 objects
released channel: ORA_DISK_1
allocated channel: ORA_DISK_1
channel ORA_DISK_1: sid=156 devtype=DISK
specification does not match any archive log in the recovery catalog
database name is "ORCL" and DBID is 1301354837
database dropped

This chapter describes how to troubleshoot Recovery Manager. This chapter contains these topics:

Interpreting RMAN Message Output

Recovery Manager provides detailed error messages that can aid in troubleshooting problems. Also, the Oracle database server and third-party media vendors generate useful debugging output of their own. The discussion which follows explains how to identify and interpret the different errors you may encounter.

Identifying Types of Message Output

Output that is useful for troubleshooting failed or hung RMAN jobs is located in several different places, as explained in the following table.

Type of Output Produced By Location Description
RMAN messages RMAN Completed job information is in V$RMAN_STATUS and RC_RMAN_STATUS. Current job information is in V$RMAN_OUTPUT.

When running RMAN from the command line, you can direct output to the following places:

  • Standard output

  • A log file specified by LOG on the command line or the SPOOL LOG command

  • A file created by redirecting RMAN output (for example, UNIX > operator)

Contains actions relevant to the RMAN job as well as error messages generated by RMAN, the database server, and the media vendor. RMAN error messages have an RMAN-xxxxx prefix. Normal action descriptions do not have a prefix.
alert_SID.log Oracle database server The directory named in the BACKGROUND_DUMP_DEST initialization parameter. Contains a chronological log of errors, initialization parameter settings, and administration operations. Records values for overwritten control file records (refer to Oracle Data Guard Concepts and Administration).
Oracle trace file Oracle database server The directory specified in the USER_DUMP_DEST initialization parameter. Contains detailed output generated by Oracle server processes. This file is created when an ORA-600 or ORA-3113 error message occurs, whenever RMAN cannot allocate a channel, and when the database fails to load the media management library.
sbtio.log Third-party media management software The directory specified in the USER_DUMP_DEST initialization parameter. Contains vendor-specific information written by the media management software. This log does not contain Oracle server or RMAN errors.
Media manager log file Third-party media management software The filenames for any media manager logs other than sbtio.log are determined by the media management software. Contains information on the functioning of the media management device.

Recognizing RMAN Error Message Stacks

RMAN reports errors as they occur. If an error is not retrievable, that is, RMAN cannot perform failover to another channel to complete a particular job step, then RMAN also reports a summary of the errors after all job sets complete. This feature is known as deferred error reporting.

One way to determine whether RMAN encountered an error is to examine its return code, as described in «Identifying RMAN Return Codes». A second way is to search the RMAN output for the string RMAN-00569, which is the message number for the error stack banner. All RMAN errors are preceded by this error message. If you do not see an RMAN-00569 message in the output, then there are no errors. Following is sample output for a syntax error:

RMAN-00571: ===========================================================
RMAN-00569: =============== ERROR MESSAGE STACK FOLLOWS ===============
RMAN-00571: ===========================================================
RMAN-00558: error encountered while parsing input commands
RMAN-01005: syntax error: found ")": expecting one of: "archivelog, backup, backupset, controlfilecopy, current, database, datafile, datafilecopy, (, plus, ;, tablespace"
RMAN-01007: at line 1 column 18 file: standard input

Identifying Error Codes

Typically, you find the following types of error codes in RMAN message stacks:

  • Errors prefixed with RMAN-

  • Errors prefixed with ORA-

  • Errors preceded by the line Additional information:

RMAN Error Message Numbers

Table 12-1 indicates the error ranges for common RMAN error messages, all of which are described in Oracle Database Error Messages.

Table 12-1 RMAN Error Message Ranges

Error Range Cause

0550-0999

Command-line interpreter

1000-1999

Keyword analyzer

2000-2999

Syntax analyzer

3000-3999

Main layer

4000-4999

Services layer

5000-5499

Compilation of RESTORE or RECOVER command

5500-5999

Compilation of DUPLICATE command

6000-6999

General compilation

7000-7999

General execution

8000-8999

PL/SQL programs

9000-9999

Low-level keyword analyzer

10000-10999

Server-side execution

11000-11999

Interphase errors between PL/SQL and RMAN

12000-12999

Recovery catalog packages

ORA-19511: Media Manager Errors

In the event of a media manager error, ORA-19511 is signalled, and the media manager is expected to provide RMAN a descriptive error. RMAN will display the error passed back to it by the media manager. For example, you might see this:

ORA-19511: Error received from media manager layer, error text:
   sbtpvt_open_input: file .* does not exist or cannot be accessed, errno = 2

The message from the media manager should provide you with enough information to let you fix the root problem. If it does not, you should refer to the documentation for your media manager or contact your media management vendor support representative for further information. ORA-19511 errors originate with the media manager, not the Oracle database. The database merely passes the message on from the media manager. The cause can only be addressed by the media management vendor.

Note that if you are still using an SBT 1.1-compliant media management layer, you may see some additional error message text. Output from an SBT 1.1-compliant media management layer is similar to the following:

ORA-19507: failed to retrieve sequential file, handle="c-140148591-20031014-06", parms=""
ORA-27007: failed to open file
Additional information: 7000
Additional information: 2
ORA-19511: Error received from media manager layer, error text:
   SBT error = 7000, errno = 0, sbtopen: backup file not found

The «Additional information» provided uses error codes specific to SBT 1.1. The values displayed correspond to the media manager message numbers and error text listed in Table 12-2. RMAN re-signals the error, as an ORA-19511 Error received from media manager layer error, and a general error message related to the error code returned from the media manager and including the SBT 1.1 error number is then displayed.

The SBT 1.1 error messages are listed here for your reference. Table 12-2 lists media manager message numbers and their corresponding error text. In the error codes, O/S stands for operating system. The errors prefixed with an asterisk are internal and should not typically be seen during normal operation.

Table 12-2 Media Manager Error Message Ranges

Cause No. Message

sbtopen

7000

7001

7002*

7003

7004

7005

7006

7007

7008

7009

7010

7011

7012*

Backup file not found (only returned for read)

File exists (only returned for write)

Bad mode specified

Invalid block size specified

No tape device found

Device found, but busy; try again later

Tape volume not found

Tape volume is in-use

I/O Error

Can’t connect with Media Manager

Permission denied

O/S error for example malloc, fork error

Invalid argument(s) to sbtopen

sbtclose

7020*

7021*

7022

7023

7024*

7025

Invalid file handle or file not open

Invalid flags to sbtclose

I/O error

O/S error

Invalid argument(s) to sbtclose

Can’t connect with Media Manager

sbtwrite

7040*

7041

7042

7043

7044*

Invalid file handle or file not open

End of volume reached

I/O error

O/S error

Invalid argument(s) to sbtwrite

sbtread

7060*

7061

7062

7063

7064

7065*

Invalid file handle or file not open

EOF encountered

End of volume reached

I/O error

O/S error

Invalid argument(s) to sbtread

sbtremove

7080

7081

7082

7083

7084

7085

7086*

Backup file not found

Backup file in use

I/O Error

Can’t connect with Media Manager

Permission denied

O/S error

Invalid argument(s) to sbtremove

sbtinfo

7090

7091

7092

7093

7094

7095*

Backup file not found

I/O Error

Can’t connect with Media Manager

Permission denied

O/S error

Invalid argument(s) to sbtinfo

sbtinit

7110*

7111

Invalid argument(s) to sbtinit

O/S error

Interpreting RMAN Error Stacks

Sometimes you may find it difficult to identify the useful messages in the RMAN error stack. Note the following tips and suggestions:

  • Read the messages from the bottom up, because this is the order in which RMAN issues the messages. The last one or two errors displayed in the stack are often the most informative.

  • When using an SBT 1.1 media management layer and presented with SBT 1.1 style error messages containing the «Additional information:» numeric error codes, look for the ORA-19511 message that follows for the text of error messages passed back to RMAN by the media manager. These should identify the real failure in the media management layer.

  • Look for the RMAN-03002 or RMAN-03009 message (RMAN-03009 is the same as RMAN-03002 but includes the channel ID), immediately following the error banner. These messages indicate which command failed. Syntax errors generate RMAN-00558.

  • Identify the basic type of error according to the error range chart in Table 12-1 and then refer to Oracle Database Error Messages for information on the most important messages.

Interpreting RMAN Errors: Example

You attempt a backup of tablespace users and receive the following message:

Starting backup at 29-AUG-02
using channel ORA_DISK_1
RMAN-00571: ===========================================================
RMAN-00569: =============== ERROR MESSAGE STACK FOLLOWS ===============
RMAN-00571: ===========================================================
RMAN-03002: failure of backup command at 08/29/2002 15:14:03
RMAN-20202: tablespace not found in the recovery catalog
RMAN-06019: could not translate tablespace name "USESR"

The RMAN-03002 error indicates that the BACKUP command failed. You read the last two messages in the stack first and immediately see the problem: no tablespace usesr appears in the recovery catalog because you mistyped the name.

Interpreting Server Errors: Example

Assume that you attempt to recover a tablespace and receive the following errors:

RMAN> RECOVER TABLESPACE users;

Starting recover at 29-AUG-01
using channel ORA_DISK_1

starting media recovery
media recovery failed
RMAN-00571: ===========================================================
RMAN-00569: =============== ERROR MESSAGE STACK FOLLOWS ===============
RMAN-00571: ===========================================================
RMAN-03002: failure of recover command at 08/29/2001 15:18:43
RMAN-11003: failure during parse/execution of SQL statement: alter database recover if needed tablespace USERS
ORA-00283: recovery session canceled due to errors
ORA-01124: cannot recover data file 8 - file is in use or recovery
ORA-01110: data file 8: '/oracle/oradata/trgt/users01.dbf'

As suggested, you start reading from the bottom up. The ORA-01110 message explains there was a problem with the recovery of datafile users01.dbf. The second error indicates that the database cannot recover the datafile because it is in use or already being recovered. The remaining RMAN errors indicate that the recovery session was cancelled due to the server errors. Hence, you conclude that because you were not already recovering this datafile, the problem must be that the datafile is online and you need to take it offline and restore a backup.

Interpreting SBT 2.0 Media Management Errors: Example

Assume that you use a tape drive and receive the following output during a backup job:

RMAN-00571: ===========================================================
RMAN-00569: =============== ERROR MESSAGE STACK FOLLOWS ===============
RMAN-00571: ===========================================================
ORA-19624: operation failed, retry possible
ORA-19507: failed to retrieve sequential file, handle="/tmp/foo", parms=""
ORA-27029: skgfrtrv: sbtrestore returned error
ORA-19511: Error received from media manager layer, error text:
  sbtpvt_open_input:file /tmp/foo does not exist or cannot be accessed, errno=2

The error text displayed following the ORA-19511 error is generated by the media manager and describes the real source of the failure. Refer to the media manager documentation to interpret this error.

Interpreting SBT 1.1 Media Management Errors: Example

Assume that you use a tape drive and receive the following output during a backup job:

RMAN-00571: ===========================================================
RMAN-00569: =============== ERROR MESSAGE STACK FOLLOWS ===============
RMAN-00571: ===========================================================
RMAN-03009: failure of backup command on c1 channel at 09/04/2001 13:18:19
ORA-19506: failed to create sequential file, name="07d36ecp_1_1", parms=""
ORA-27007: failed to open file
SVR4 Error: 2: No such file or directory
Additional information: 7005
Additional information: 1
ORA-19511: Error received from media manager layer, error text:
   SBT error = 7005, errno = 2, sbtopen: system error

The main information of interest returned by SBT 1.1 media managers is the error code in the «Additional information» line:

Additional information: 7005

Referring to Table 12-2, «Media Manager Error Message Ranges», you discover that error 7005 means that the media management device is busy. So, the media management software is not able to write to the device because it is in use or there is a problem with it.

Note:

The sbtio.log contains information written by the media management software, not the Oracle database server. Hence, you must consult your media vendor documentation to interpret the error codes and messages. If no information is written to the sbtio.log, contact your media manager support to ask whether they are writing error messages in some other location, or whether there are steps you need to take to have the media manager errors appear in sbtio.log.

Identifying RMAN Return Codes

One way to determine whether RMAN encountered an error is to examine its return code or exit status. The RMAN client returns 0 to the shell from which it was invoked if no errors occurred, and a nonzero error value otherwise.

How you access this return code depends upon the environment from which you invoked the RMAN client. For example, if you are running UNIX with the C shell, then, when RMAN completes, the return code is placed in a shell variable called $status. The method of returning exit status is a detail specific to the host operating system rather than the RMAN client.

Testing the Media Management API

On some platforms, Oracle provides a diagnostic tool called sbttest. This utility performs a simple test of the media management software by attempting to communicate with the media manager as the Oracle database server would.

Obtaining the sbttest Utility

On UNIX, the sbttest utility is typically located in $ORACLE_HOME/bin. If for some reason the utility is not included with your platform, then contact Oracle Support to obtain the C version of the program. You can compile this version of the program on all UNIX platforms.

Note that on platforms such as Solaris, you do not have to relink when using sbttest. On other platforms, relinking may be necessary.

Obtaining Online Documentation for the sbttest Utility

For online documentation of sbttest, issue the following on the command line:

% sbttest

The program displays the list of possible arguments for the program:

Error: backup file name must be specified
Usage: sbttest backup_file_name # this is the only required parameter
               <-dbname database_name>
               <-trace trace_file_name>
               <-remove_before>
               <-no_remove_after> 
               <-read_only>
               <-no_regular_backup_restore>
               <-no_proxy_backup>
               <-no_proxy_restore>
               <-file_type n>
               <-copy_number n>
               <-media_pool n>
               <-os_res_size n>
               <-pl_res_size n>
               <-block_size block_size> 
               <-block_count block_count>
               <-proxy_file os_file_name bk_file_name 
                           [os_res_size pl_res_size block_size block_count]>
               <-libname sbt_library_name>

The display also indicates the meaning of each argument. For example, following is the description for two optional parameters:

Optional parameters:
  -dbname  specifies the database name which will be used by SBT
           to identify the backup file. The default is "sbtdb"
  -trace   specifies the name of a file where the Media Management 
           software will write diagnostic messages.

Using the sbttest Utility

Use sbttest to perform a quick test of the media manager.

If sbttest returns 0, then the test ran without error, which means that the media manager is correctly installed and can accept a data stream and return the same data when requested. If sbttest returns a non-zero value, then either the media manager is not installed or it is not configured correctly.

To use sbttest:

Make sure the program is installed and included in the system path by typing sbttest at the command line:

% sbttest

If the program is operational, then you should see a display of the online documentation.

Execute the program, specifying any of the arguments described in the online documentation. For example, enter the following to create test file some_file.f and write the output to sbtio.log:

% sbttest some_file.f -trace sbtio.log

You can also test a backup of an existing datafile. For example, this command tests datafile tbs_33.f of database prod:

% sbttest tbs_33.f -dbname prod

Examine the output. If the program encounters an error, then it provides messages describing the failure. For example, if the database cannot find the library, you see:

libobk.so could not be loaded. Check that it is installed properly, and that LD_LIBRARY_PATH environment variable (or its equivalent on your platform) includes the directory where this file can be found. Here is some additional information on the cause of this error:
ld.so.1: sbttest: fatal: libobk.so: open failed: No such file or directory

Note that in some cases sbttest can work but an RMAN backup does not. The reasons can be the following:

  • The user who starts sbttest is not the owner of the Oracle processes.

  • If the database server is not linked with the media management library or cannot load it dynamically when needed, then RMAN backups to the media manager fail, but sbttest may still work.

  • The sbttest program passes all environment parameters from the shell but RMAN does not.

Terminating an RMAN Command

There are several ways to terminate an RMAN command in the middle of execution:

  • The preferred method is to press CTRL+C (or the equivalent «attention» key combination for your system) in the RMAN interface. This will also terminates allocated channels, unless they are hung in the media management code, as happens when, for example, when they are waiting for a tape to be mounted.

  • You can kill the server session corresponding to the RMAN channel by running the SQL ALTER SYSTEM KILL SESSION statement.

  • You can terminate the server session corresponding to the RMAN channel on the operating system.

Terminating the Session with ALTER SYSTEM KILL SESSION

You can identify the Oracle session ID for an RMAN channel by looking in the RMAN log for messages with the format shown in the following example:

channel ch1: sid=15 devtype=SBT_TAPE

The sid and devtype are displayed for each allocated channel. Note that the Oracle sid is different from the operating system process ID. You can kill the session using a SQL ALTER SYSTEM KILL SESSION statement.

ALTER SYSTEM KILL SESSION takes two arguments, the sid printed in the RMAN message and a serial number, both of which can be obtained by querying V$SESSION. For example, run the following statement, where sid_in_rman_output is the number from the RMAN message:

SELECT SERIAL# FROM V$SESSION WHERE SID=sid_in_rman_output;

Then, run the following statement, substituting the sid_in_rman_output and serial number obtained from the query:

ALTER SYSTEM KILL SESSION 'sid_in_rman_output,serial#';

Note that this will not unhang the session if the session is hung in media manager code..

Terminating the Session at the Operating System Level

Finding and killing the processes that are associated with the server sessions is operating system specific. On some platforms the server sessions are not associated with any processes at all. Refer to your operating system specific documentation for more information.

Terminating an RMAN Session That Is Hung in the Media Manager

You may sometimes need to kill an RMAN job that is hung in the media manager. The best way to terminate RMAN when the channel connections are hung in the media manager is to kill the session in the media manager. If this action does not solve the problem, then on some platforms, such as Unix, you may be able to kill the Oracle processes of the connections. (Note that killing the Oracle processes may cause problems from the media manager. See your media manager documentation for details.)

Components of an RMAN Session

The nature of an RMAN session depends on the operating system. In UNIX, an RMAN session has the following processes associated with it:

  • The RMAN client process itself

  • The default channel, the initial connection to the target database

  • One target connection to the target database corresponding to each allocated channel

  • The catalog connection to the recovery catalog database, if you use a recovery catalog

  • An auxiliary connection to an auxiliary instance, during DUPLICATE or TSPITR operations

  • A polling connection to the target database, used for monitoring RMAN command execution on the various allocated channels. By default, RMAN makes one polling connection. RMAN makes additional polling connections if you use different connect strings in the ALLOCATE CHANNEL or CONFIGURE CHANNEL commands. One polling connection exists for each distinct connect string used in the ALLOCATE CHANNEL or CONFIGURE CHANNEL command.

Process Behavior During a Hung Job

RMAN usually hangs because one of the channel connections is waiting in the media manager code for a tape resource. The catalog connection and the default channel appear to hang, because they are waiting for RMAN to tell them what to do. Polling connections seem to be in an infinite loop while polling the RPC under the control of the RMAN process.

If you kill the RMAN process itself, then you also kill the catalog connection, the auxiliary connection, the default channel, and the polling connections. If target and auxiliary connections are not hung in the media manager code, they also terminate. If either the target connection or any of the auxiliary connections are executing in the media management layer, they will not terminate until the processes are manually killed at the operating system level.

Not all media managers can detect the termination of the Oracle process. Those which cannot may keep resources busy or continue processing. Consult your media manager documentation for details.

Terminating the catalog connection does not cause the RMAN process to terminate because RMAN is not performing catalog operations while the backup or restore is in progress. Removing default channel and polling connections causes the RMAN process to detect that one of the channels has died and then proceed to exit. In this case, the connections to the hung channels remain active as described previously.

Terminating an RMAN Session: Basic Steps

Once the hung channels in the media manager code are killed, the RMAN process detects this termination and proceed to exit, removing all connections except target connections that are still operative in the media management layer. The warning about the media manager resources still applies in this case.

To terminate an Oracle process that is hung in the media manager:

Query V$SESSION and V$SESSION_WAIT as described in «Monitoring RMAN Through V$ Views». For example, execute the following query:

COLUMN EVENT FORMAT a10
COLUMN SECONDS_IN_WAIT FORMAT 999
COLUMN STATE FORMAT a20
COLUMN CLIENT_INFO FORMAT a30

SELECT p.SPID, EVENT, SECONDS_IN_WAIT AS SEC_WAIT, 
       sw.STATE, CLIENT_INFO
FROM V$SESSION_WAIT sw, V$SESSION s, V$PROCESS p
WHERE sw.EVENT LIKE 'sbt%'
       AND s.SID=sw.SID
       AND s.PADDR=p.ADDR
;

Examine the SQL output to determine which sbt functions are waiting. For example, the output may be as follows:

SPID EVENT        SEC_WAIT STATE                CLIENT_INFO
---- ---------- ---------- -------------------- -------------
8642 sbtwrite2         600 WAITING              rman channel=ORA_SBT_TAPE_1
8374 sbtwrite2         600 WAITING              rman channel=ORA_SBT_TAPE_2

Using operating system-level tools appropriate to your platform, kill the hung sessions. For example, on Solaris execute a kill -9 command:

% kill -9 8642 8374

On Windows, there is a command-line utility called ORAKILL which lets you kill a specific thread in this situation. From a command prompt, run the following command:

orakill sid thread_id

where sid identifies the database instance to target, and the thread_id is the SPID value from the query in step 1.

Check that the media manager also clears its processes. If any remain, the next backup or restore operation may hang again, due to the previous hang. In some media managers, the only solution is to shut down and restart the media manager. If the documentation from the media manager does not provide the needed information, contact technical support for the media manager.

See Also:

Your operating system specific documentation for the relevant commands

RMAN Troubleshooting Scenarios

This section contains these topics:

  • After Installation of Media Manager, RMAN Channel Allocation Fails: Scenario

  • Backup Job Is Hanging: Scenario

  • RMAN Fails to Start RPC Call: Scenario

  • Backup Fails with Invalid RECID Error: Scenario

  • Backup Fails Because of Control File Enqueue: Scenario

  • RMAN Fails to Delete All Archived Logs: Scenario

  • Backup Fails Because RMAN Cannot Locate an Archived Log: Scenario

  • RMAN Does Not Recognize Character Set Name: Scenario

  • RMAN Denies Logon to Target Database: Scenario

  • Database Duplication Fails Because of Missing Log: Scenario

  • Duplication Fails with Multiple RMAN-06023 Errors: Scenario

  • UNKNOWN Database Name Appears in Recovery Catalog: Scenario

After Installation of Media Manager, RMAN Channel Allocation Fails: Scenario

In this scenario, you install and test the media manager as explained in «Configuring RMAN to Make Backups to a Media Manager», but you still cannot make RMAN back up to tape. For example, after allocating the sbt channel, you receive an error stack similar to the following:

RMAN-00571: ===========================================================
RMAN-00569: =============== ERROR MESSAGE STACK FOLLOWS ===============
RMAN-00571: ===========================================================
RMAN-03009: failure of allocate command on c1 channel at 08/29/2001 17:16:54
ORA-19554: error allocating device, device type: SBT_TAPE, device name: 
ORA-27211: Failed to load Media Management Library
Additional information: 25

The most important line of the error output is the ORA-27211 error. It indicates the basic problem, that the media management library could not be loaded. Typically, there is no need to refer to the trace file or sbtio.log in such a case.

After Installation of Media Manager, RMAN Channel Allocation Fails: Diagnosis

The ORA-27211 error indicates that the channel allocation is failing because the database is not loading the media management library. If the channel allocation fails, then the database generates a trace file in the USER_DUMP_DEST location that contains the error that caused the channel allocation to fail. The trace file should have the complete path name of the media management library loaded by the database as well as any other media manager errors or operating system errors. For example, the trace file on UNIX may be called something like /oracle/rdbms/log/prod1_ora_16226.trc, and may contain information such as the following:

*** 2001-08-29 17:16:54.385
SKGFQ OSD: Error in function sbtinit on line 2396
SKGFQ OSD: Look for SBT Trace messages in file /oracle/rdbms/log/sbtio.log
SBT Initialize failed for oracle.static 

The last line of this output indicates that Oracle is loading the default static library instead of the media management library that you installed.

You may find more detailed information in the file sbtio.log, as described in the error message. Note, however, that writing SBT trace messages is the responsibility of the media management software, not the Oracle database or RMAN. The media management vendor may not have implemented the writing of trace messages in a particular situation. Contact the media management vendor for details about the trace messages written to sbtio.log.

To test the loading of the media management library, try allocating a channel by using the PARMS parameter SBT_LIBRARY to force the loading of the media management library. For example, if your library is called /vendor/lib/some_mm_lib.so, then run a command such as the following, making sure to specify whatever PARMS settings are required by your media manager:

RUN
{
  ALLOCATE CHANNEL c1 DEVICE TYPE sbt 
    PARMS='SBT_LIBRARY=/vendor/lib/some_mm_lib.so',
    'ENV=(NSR_SERVER=tape_svr,NSR_CLIENT=oracleclnt,NSR_GROUP=oracle_tapes)';
}

If the channel allocation fails, then check the trace file again to see whether you can learn anything new. If the channel allocation with SBT_LIBRARY succeeds, but an ordinary sbt channel allocation fails, then the database is probably trying to load a library other than the one you installed. By default, the database expects to find the media management library at $ORACLE_HOME/lib/libobk.so on UNIX, or %ORACLE_HOME%/bin/orasbt.dll on NT. You may have more than one library in the operating system path, and the database is loading the wrong one.

After Installation of Media Manager, RMAN Channel Allocation Fails: Solution

If the problem is that the database is not loading the correct library, then make sure that the library is named correctly in the SBT_LIBRARY parameter.

Backup Job Is Hanging: Scenario

In this scenario, an RMAN backup job starts as normal and then pauses inexplicably:

Recovery Manager: Release 10.1.0.2.0 - Production

Copyright (c) 1995, 2003, Oracle.  All rights reserved.

connected to target database: TRGT
connected to recovery catalog database

RMAN> BACKUP TABLESPACE SYSTEM, tools;

allocated channel: t1
channel t1: sid=16 devtype=SBT_TAPE

channel t1: starting datafile backupset
set_count=15 set_stamp=338309600
channel t1: including datafile 2 in backupset
channel t1: including datafile 1 in backupset
channel t1: including current control file in backupset
# Hanging here for 30 minutes now

Backup Job Is Hanging: Diagnosis

If a backup job is hanging, that is, not proceeding, then several scenarios are possible:

  • A server-side or media management error occurred.

  • RMAN is waiting for an event such as the insertion of a new cassette into the tape device.

Query sbt wait events to gain more information. For example, run the following query on the target instance:

COLUMN EVENT FORMAT a10
COLUMN SECONDS_IN_WAIT FORMAT 999
COLUMN STATE FORMAT a20
COLUMN CLIENT_INFO FORMAT a30

SELECT p.SPID, EVENT, SECONDS_IN_WAIT AS SEC_WAIT, 
       sw.STATE, CLIENT_INFO
FROM V$SESSION_WAIT sw, V$SESSION s, V$PROCESS p
WHERE sw.EVENT LIKE 'sbt%'
      AND s.SID=sw.SID
      AND s.PADDR=p.ADDR
;

Examine the SQL output to determine which sbt functions are waiting. For example, the output may be as follows:

SPID EVENT        SEC_WAIT STATE                CLIENT_INFO
---- ---------- ---------- -------------------- ------------------------------
8642 sbtbackup        1500 WAITING              rman channel=ORA_SBT_TAPE_1

Backup Job Is Hanging: Solution

Because the causes of a hung backup job can be varied, so are the solutions. For example, backup jobs often hang simply because the tape device has completely filled the current cassette and is waiting for a new tape to be inserted. Ideally, the query of the sbt wait events should indicate the problem.

In this example, a single sbtbackup has taken 1500 seconds, so RMAN is waiting on the media manager to finish its write operation. Check that the media manager is functioning normally, and contact the media management vendor’s technical support for assistance.

If the sbt wait event query is unhelpful, then examine media manager process, log, and trace files for signs of abnormal termination or other errors (refer to the description of message files in «Identifying Types of Message Output»).

See Also:

«Terminating an RMAN Session: Basic Steps» to learn how to kill an RMAN session that is hanging

RMAN Fails to Start RPC Call: Scenario

In this scenario, you run a backup job and receive message output similar to the following:

channel c8: including datafile number 47 in backupset
RPC call appears to have failed to start on channel c9
RPC call ok on channel c9
channel c3: including datafile number 18 in backupset

RMAN Fails to Start RPC Call: Diagnosis

The RPC call appears to have failed message does not usually indicate a problem. The message indicates one of the following:

  • The target database instance is slow.

  • A timing problem occurred.

Timing problems occur in this way. When RMAN begins an RPC, it checks the V$SESSION performance view. The RPC updates the information in the view to indicate when it starts and finishes. Sometimes RMAN checks V$SESSION before the RPC has indicated it has started, which in turn generates the following message:

RPC call appears to have failed

If a message stating «RPC call ok» does not appear in the output immediately following the message stating «RPC call appears to have failed«, then the backup job encountered an internal problem. Contact Oracle Support for further assistance.

Backup Fails with Invalid RECID Error: Scenario

In this scenario, you attempt a backup and receive the following error messages:

RMAN-3014: Implicit resync of recovery catalog failed
RMAN-6038: Recovery catalog package detected an error
RMAN-20035: Invalid high RECID error

Backup Fails with Invalid RECID Error: Diagnosis

In one common scenario, you restore a backup control file created through a non-Oracle mechanism, and then open the database without the RESETLOGS option. If you had created the backup control file through the RMAN BACKUP command or the SQL ALTER DATABASE BACKUP CONTROLFILE statement, then the database would have required you to reset the online logs.

The control file and the recovery catalog are now not synchronized. The database control file is older than the recovery catalog, because at one time the recovery catalog resynchronized with the old current control file, and now the database is using a backup control file. RMAN detects that the control file currently in use is older than the control file previously used to resynchronize.

Another common scenario occurs when you attempt to copy the target database to a new machine as follows:

On machine 1, you shut down the database and make a copy of the control file with an operating system utility. You do not use CATALOG to add this control file copy to the repository.

You transfer the control file copy to machine 2.

On machine 2, you create a new initialization parameter file and new database instance.

You mount the control file copy on machine 2. The database does not recognize the control file as a backup control file: to the database it looks like the current control file.

You start RMAN and connect to the new target database and the recovery catalog on machine 2. Because the control file was not created with RMAN and was not cataloged as a control file copy, RMAN sees the database on machine 2 as the database on machine 1.

You restore and recover database the new database on machine 2 and then open it. As a consequence, various records are added to the recovery catalog during the restore and recovery. For example, the highest RECID in the recovery catalog moves from 90 to 100.

On machine 1, you start RMAN and connect to the original target database and recovery catalog. The recovery catalog indicates that the highest RECID is 100, but the control file indicates that the highest RECID is 90. The control file RECID should always be greater than or equal to the recovery catalog RECID, so RMAN issues RMAN-20035.

Backup Fails with Invalid RECID Error: Solution 1

This solution is safest and is strongly recommended. It preserves the control file, so that the historical information about the database stored in the control file continues to be available after the procedure.

To reset the database with RMAN:

Connect to the target database with SQL*Plus. For example, enter:

% sqlplus '/ AS SYSDBA'

Mount the database if it is not already mounted. For example, enter:

ALTER DATABASE MOUNT;

Start cancel-based recovery by using the backup control file, then cancel it. The reason for canceling is that the USING BACKUP CONTROLFILE clause stamps the control file as a backup, which then permits OPEN RESETLOGS. For example, enter:

ALTER DATABASE RECOVER DATABASE UNTIL CANCEL USING BACKUP CONTROLFILE;
ALTER DATABASE RECOVER CANCEL;

Use RMAN to connect to the target database and recovery catalog. For example, enter:

% rman TARGET SYS/oracle@trgt CATALOG rman/cat@catdb

Open the database with the RESETLOGS option. For example, enter:

RMAN> ALTER DATABASE OPEN RESETLOGS;

Take new backups so that you can recover the database if necessary. For example, enter:

BACKUP DATABASE PLUS ARCHIVELOG;

Backup Fails with Invalid RECID Error: Solution 2

This solution is similar to the previous one, but does require that you re-create your control file. It is better-suited for the case in which you are copying your database to a second system, where you may not want to keep the history from the control file for the copy of the database on the second system, or where you might drop a few datafiles or change the online logs by editing your control file.

To create the control file with SQL*Plus:

Connect to the target database with SQL*Plus. For example, enter:

% sqlplus 'SYS/oracle@trgt AS SYSDBA'

Mount the database if it is not already mounted:

SQL> ALTER DATABASE MOUNT;

Back up the control file to a trace file:

SQL> ALTER DATABASE BACKUP CONTROLFILE TO TRACE;

Edit the trace file as necessary. The relevant section of the trace file looks something like the following:

# The following commands will create a new control file and use it
# to open the database.
# Data used by the recovery manager will be lost. Additional logs may
# be required for media recovery of offline data files. Use this
# only if the current version of all online logs are available.
STARTUP NOMOUNT
CREATE CONTROLFILE REUSE DATABASE "TRGT" NORESETLOGS  ARCHIVELOG
--  STANDBY DATABASE CLUSTER CONSISTENT AND UNPROTECTED
    MAXLOGFILES 32
    MAXLOGMEMBERS 2
    MAXDATAFILES 32
    MAXINSTANCES 1
    MAXLOGHISTORY 226
LOGFILE
  GROUP 1 '/oracle/oradata/trgt/redo01.log'  SIZE 25M,
  GROUP 2 '/oracle/oradata/trgt/redo02.log'  SIZE 25M,
  GROUP 3 '/oracle/oradata/trgt/redo03.log'  SIZE 500K
-- STANDBY LOGFILE
DATAFILE
  '/oracle/oradata/trgt/system01.dbf',
  '/oracle/oradata/trgt/undotbs01.dbf',
  '/oracle/oradata/trgt/cwmlite01.dbf',
  '/oracle/oradata/trgt/drsys01.dbf',
  '/oracle/oradata/trgt/example01.dbf',
  '/oracle/oradata/trgt/indx01.dbf',
  '/oracle/oradata/trgt/tools01.dbf',
  '/oracle/oradata/trgt/users01.dbf'
CHARACTER SET WE8DEC
;
# Take files offline to match current control file.
ALTER DATABASE DATAFILE '/oracle/oradata/trgt/tools01.dbf' OFFLINE;
ALTER DATABASE DATAFILE '/oracle/oradata/trgt/users01.dbf' OFFLINE;
# Configure RMAN configuration record 1
VARIABLE RECNO NUMBER;
EXECUTE :RECNO := SYS.DBMS_BACKUP_RESTORE.SETCONFIG('CHANNEL','DEVICE TYPE DISK DEBUG 255');
# Recovery is required if any of the datafiles are restored backups,
# or if the last shutdown was not normal or immediate.
RECOVER DATABASE
# All logs need archiving and a log switch is needed.
ALTER SYSTEM ARCHIVE LOG ALL;
# Database can now be opened normally.
ALTER DATABASE OPEN;
# Commands to add tempfiles to temporary tablespaces.
# Online tempfiles have complete space information.
# Other tempfiles may require adjustment.
ALTER TABLESPACE TEMP ADD TEMPFILE '/oracle/oradata/trgt/temp01.dbf' REUSE;
# End of tempfile additions.

Shut down the database:

SHUTDOWN IMMEDIATE

Execute the script to create the control file, recover (if necessary), archive the logs, and open the database:

STARTUP NOMOUNT
CREATE CONTROLFILE ...;
EXECUTE ...;
RECOVER DATABASE
ALTER SYSTEM ARCHIVE LOG CURRENT;
ALTER DATABASE OPEN ...;

If you intend to keep and continue using this copy of the database, use the DBNEWID utility to change the name and DBID of the new database as needed.

Caution:

If you do not open with the RESETLOGS option, then two copies of an archived redo log for a given log sequence number may exist—even though these two copies have completely different contents. For example, one log may have been created on the original host and the other on the new host. If you accidentally confuse the logs during a media recovery, then the database will be corrupted but Oracle and RMAN cannot detect the problem.

Backup Fails Because of Control File Enqueue: Scenario

In this scenario, a backup job fails because RMAN cannot make a snapshot control file. The message stack is as follows:

RMAN-00571: ===========================================================
RMAN-00569: =============== ERROR MESSAGE STACK FOLLOWS ===============
RMAN-00571: ===========================================================
RMAN-03002: failure of backup command at 08/30/2001 22:48:44
ORA-00230: operation disallowed: snapshot control file enqueue unavailable

Backup Fails Because of Control File Enqueue: Diagnosis

When RMAN needs to back up or resynchronize from the control file, it first creates a snapshot or consistent image of the control file. If one RMAN job is already backing up the control file while another needs to create a new snapshot control file, then you may see the following message:

waiting for snapshot control file enqueue

Under normal circumstances, a job that must wait for the control file enqueue waits for a brief interval and then successfully obtains the enqueue. RMAN makes up to five attempts to get the enqueue and then fails the job. The conflict is usually caused when two jobs are both backing up the control file, and the job that first starts backing up the control file waits for service from the media manager.

To determine which job is holding the conflicting enqueue:

After you see the first message stating «RMAN-08512: waiting for snapshot control file enqueue«, start a new SQL*Plus session on the target database:

% sqlplus 'SYS/oracle@trgt AS SYSDBA'

Execute the following query to determine which job is causing the wait:

SELECT s.SID, USERNAME AS "User", PROGRAM, MODULE, 
       ACTION, LOGON_TIME "Logon", l.* 
FROM V$SESSION s, V$ENQUEUE_LOCK l
WHERE l.SID = s.SID
AND l.TYPE = 'CF'
AND l.ID1 = 0
AND l.ID2 = 2;

You should see output similar to the following (the output in this example has been truncated):

SID User Program              Module                    Action           Logon
--- ---- -------------------- ------------------- ---------------- ---------
9 SYS rman@h13 (TNS V1-V3) backup full datafile: c10000210 STARTED 21-JUN-01

Backup Fails Because of Control File Enqueue: Solution

Commonly, enqueue situations occur when a job is writing to a tape drive, but the tape drive is waiting for new tape to be inserted. If you start a new job in this situation, then you will probably receive the enqueue message because the first job cannot complete until the new tape is loaded.

After you have determined which job is creating the enqueue, you can do one of the following:

  • Wait until the job holding the enqueue completes

  • Cancel the current job and restart it after the job holding the enqueue completes

  • Cancel the job creating the enqueue

RMAN Fails to Delete All Archived Logs: Scenario

In this scenario, the database archives automatically to two directories: ORACLE_HOME/oradata/trgt/arch and ORACLE_HOME/oradata/trgt/arch2. You tell RMAN to perform a backup and delete the input archived redo logs afterward in the following script:

BACKUP ARCHIVELOG ALL DELETE INPUT;

You then run a crosscheck to make sure the logs are gone and find the following:

CROSSCHECK ARCHIVELOG ALL;

validation succeeded for archived log
archivelog filename=/oracle/oradata/trgt/arch2/archive1_964.arc recid=19 stamp=368726072

RMAN deleted one set of logs but not the other.

RMAN Fails to Delete All Archived Logs: Diagnosis

This problem is not an error. When you specify DELETE INPUT without the ALL keyword, RMAN deletes only one copy of each input log. Even if you archive to five destinations, RMAN deletes logs from only one directory.

RMAN Fails to Delete All Archived Logs: Solution

To force RMAN to delete all existing archived redo logs, use the DELETE ALL INPUT clause of the BACKUP command. For example, enter:

BACKUP ARCHIVELOG ALL DELETE ALL INPUT;

Backup Fails Because RMAN Cannot Locate an Archived Log: Scenario

In this scenario, you schedule regular backups of the archived redo logs. The next time you make a backup, you receive this error:

RMAN-6089:  archive log NAME not found or out of sync with catalog

Backup Fails Because RMAN Cannot Locate an Archived Log: Diagnosis

This problem occurs when the archived log that RMAN is looking for cannot be accessed by RMAN, or the recovery catalog needs to be resynchronized. Often, this error occurs when you delete archived logs with an operating system command, which means that RMAN is unaware of the deletion. The RMAN-6089 error occurs because RMAN attempts to back up a log that the repository indicates still exists.

Backup Fails Because RMAN Cannot Locate an Archived Log: Solution

Make sure that the archived logs exists in the specified directory and that the RMAN catalog is synchronized. Check the following:

Make sure the archived log file that is specified by the RMAN-6089 error exists in the correct directory.

Check that the operating system permissions are correct for the archived log (owner = oracle, group = DBA) to make sure that RMAN can access the file.

If the file appears to be correct, then try synchronizing the catalog by running the following command from the RMAN prompt:

RESYNC CATALOG;

If you know that the logs are unavailable because you deleted them by using an operating system utility, then run the following command at the RMAN prompt to update RMAN metadata:

CROSSCHECK ARCHIVELOG ALL;

It is always better to use RMAN to delete logs than to use an operating system utility. The easiest method to remove unwanted logs is to specify the DELETE INPUT option when backing up archived logs. For example, enter:

BACKUP DEVICE TYPE sbt 
  ARCHIVELOG ALL 
  DELETE ALL INPUT;

RMAN Does Not Recognize Character Set Name: Scenario

In this scenario, you are connected to the target database while it is not open and attempting to perform an RMAN operation. You receive the following error:

PLS-00553: character set name is not recognized

RMAN Does Not Recognize Character Set Name: Diagnosis

Typically, this message means that the character set in the client environment, that is, the environment in which you are running the RMAN client, is different from the character set in the target database environment.

RMAN Does Not Recognize Character Set Name: Solution

Query the target database to determine the value of the NLS_CHARACTERSET parameter. For example, run this query:

SQL>      SELECT VALUE FROM V$NLS_PARAMETERS WHERE PARAMETER='NLS_CHARACTERSET';

Set the character set environment variable in the client to the same value as the variable in the server. For example, you can set the NLS_LANG environment variable on a UNIX system as follows:

% setenv NLS_LANG american_america.we8dec
% setenv NLS_DATE_FORMAT "MON DD YYYY HH24:MI:SS"

If the connection is made through a listener, then the listener must be started with the correct Globalization Support settings. Otherwise, the spawned connections inherit the incorrect Globalization Support settings from the listener.

RMAN Denies Logon to Target Database: Scenario

RMAN fails with ORA-01031 (insufficient privileges) or ORA-01017 (invalid username/password) errors when trying to connect to the target database:

% rman
Recovery Manager: Release 10.1.0.2.0 - Production

Copyright (c) 1995, 2003, Oracle.  All rights reserved.

RMAN> CONNECT TARGET sys/mypass@inst1

RMAN-00571: ===========================================================
RMAN-00569: =============== ERROR MESSAGE STACK FOLLOWS ===============
RMAN-00571: ===========================================================
ORA-01031: insufficient privileges

RMAN Denies Logon to Target Database: Diagnosis

RMAN automatically requests a connection to the target database as SYSDBA. In order to connect to the target as SYSDBA, you must do one of the following:

  • Be part of the operating system DBA group with respect to the target database (that is, have the ability to connect with SYSDBA privileges to the target database without a password).

  • Create a password file with the orapwd command and the initialization parameter REMOTE_LOGIN_PASSWORDFILE.

  • Make sure you are connecting with the correct username and password.

If the target database does not have a password file, then the user you are logged in as must be validated with operating system authentication.

RMAN Denies Logon to Target Database: Solution

Either create a password file for the target database or add yourself to the administrator list in the operating system.

Database Duplication Fails Because of Missing Log: Scenario

In this scenario, you attempt to duplicate a database with the DUPLICATE command, but receive the following error stack:

RMAN-00571: ===========================================================
RMAN-00569: =============== ERROR MESSAGE STACK FOLLOWS ===============
RMAN-00571: ===========================================================
RMAN-03002: failure of Duplicate Db command at 09/04/2001 12:11:29
RMAN-03015: error occurred in stored script Memory Script
RMAN-06053: unable to perform media recovery because of missing log
RMAN-06025: no backup of log thread 1 seq 16 scn 145858 found to restore

Database Duplication Fails Because of Missing Log: Diagnosis

The problem is that RMAN is not able to apply all the archived logs needed for complete recovery. For example, if you only backed up logs through sequence 15, but the most recent archived log is sequence 16, then DUPLICATE fails.

Database Duplication Fails Because of Missing Log: Solution

When creating the duplication script, use the SET UNTIL command to specify a log sequence number for incomplete recovery. For example, to terminate recovery after applying log sequence 15, enter:

RUN
{
  SET UNTIL SEQUENCE 16 THREAD 1;  # recovers up to but not including log 16
  DUPLICATE TARGET DATABASE TO 'dupdb';

}

Duplication Fails with Multiple RMAN-06023 Errors: Scenario

In this scenario, you back up the database, then run the DUPLICATE command. You receive the following error stack:

RMAN-00571: ===========================================================
RMAN-00569: =============== ERROR MESSAGE STACK FOLLOWS ===============
RMAN-00571: ===========================================================
RMAN-03002: failure of Duplicate Db command at 09/04/2001 13:55:11
RMAN-03015: error occurred in stored script Memory Script
RMAN-06026: some targets not found - aborting restore
RMAN-06023: no backup or copy of datafile 8 found to restore
RMAN-06023: no backup or copy of datafile 7 found to restore
RMAN-06023: no backup or copy of datafile 6 found to restore
RMAN-06023: no backup or copy of datafile 5 found to restore
RMAN-06023: no backup or copy of datafile 4 found to restore
RMAN-06023: no backup or copy of datafile 3 found to restore
RMAN-06023: no backup or copy of datafile 2 found to restore
RMAN-06023: no backup or copy of datafile 1 found to restore

Duplication Fails with Multiple RMAN-06023 Errors: Diagnosis

The DUPLICATE command recovers to archived redo logs, but cannot recover into online redo logs. Thus, if the restored backup cannot be made consistent without applying the online redo logs, then duplication fails with RMAN-06023 errors because RMAN is looking for backups created before the most recent archived log.

Duplication Fails with Multiple RMAN-06023 Errors: Solution

After backing up the source database, archive and back up the current redo log:

RMAN> SQL 'ALTER SYSTEM ARCHIVE LOG CURRENT';
RMAN> BACKUP ARCHIVELOG ALL;

This archives all records in the online redo logs so that RMAN can now recover the backup by applying the most recent archived redo log.

UNKNOWN Database Name Appears in Recovery Catalog: Scenario

In this scenario, you list the database incarnations registered in the recovery catalog and see a database with the name UNKNOWN:

LIST INCARNATION OF DATABASE;  
 
RMAN-03022: compiling command: list  
List of Database Incarnations  
DB Key  Inc Key   DB Name   DB ID       STATUS    Reset SCN    Reset Time
------- -------   -------   ------      ------    ----------   ----------
56      57        TRGT      4052472287  CURRENT   1            Sep 03 2001 06:45:51
1       19        UNKNOWN   4141147584  PARENT    1            Jan 08 2001 14:47:28
.
.
.

UNKNOWN Database Name Appears in Recovery Catalog: Diagnosis

One way you get the DB_NAME of UNKNOWN is when you register a database that was once opened with the RESETLOGS option. The DB_NAME can be changed during a RESETLOGS operation, so RMAN does not know what the DB_NAME was for those old incarnations of the database because it was not registered in the recovery catalog at the time. Consequently, RMAN sets the DB_NAME column to UNKNOWN when creating the DBINC record.

UNKNOWN Database Name Appears in Recovery Catalog: Solution

The UNKNOWN name entry is expected behavior after a RESETLOGS operation. You should not attempt to remove UNKNOWN entries from the recovery catalog.

  1. Backup and Recovery User’s Guide
  2. Tuning and Troubleshooting
  3. Troubleshooting RMAN Operations

24 Troubleshooting RMAN Operations

24.1 Interpreting RMAN Message Output

Recovery Manager provides detailed error messages that can aid in troubleshooting problems.

Also, Oracle Database and the third-party media vendors generate useful debugging output of their own. The following discussion explains how to identify and interpret the different errors that you may encounter.

24.1.1 Identifying Types of RMAN Message Output

Output that is useful for troubleshooting failed or unresponsive RMAN jobs is located in several different places.

The following table provides an overview of where to locate message output that can be used to troubleshoot RMAN backup problems.

Table 24-1 Types of Message Output

Type of Output Produced By Location Description

RMAN messages

RMAN

Completed job information is in V$RMAN_STATUS and RC_RMAN_STATUS. Current job information is in V$RMAN_OUTPUT.

When running RMAN from the command line, you can direct output to the following places:

  • Standard output

  • A log file specified by LOG on the command line or the SPOOL LOG command

  • A file created by redirecting RMAN output (for example, in UNIX, using the’>‘ operator)

Contains actions relevant to the RMAN job and error messages generated by RMAN, the database server, and the media vendor. RMAN error messages have an RMAN- prefix. Normal action descriptions do not have a prefix.

You can execute the following PL/SQL to remove all entries from V$RMAN_STATUS:

update node set high_rsr_recid=0
where db_key = our_target_database_db_key ;

The preceding function removes all job-related entries. No rows are visible until new backup jobs are shown in V$RMAN_BACKUP_JOB_DETAILS.

alert_SID.log

Oracle Database

The alert subdirectory of the Automatic Diagnostic Repository (ADR) home

Contains a chronological log of errors, initialization parameter settings, and administration operations. Records values for overwritten control file records.

Oracle trace file

Oracle Database

The trace subdirectory of the ADR home

Contains detailed output generated by Oracle Database processes. This file is created when an ORA-600 or ORA-3113 error message occurs, whenever RMAN cannot allocate a channel, and when the database fails to load the media management library.

sbtio.log

Third-party media management software

The trace subdirectory of the ADR home

Contains vendor-specific information written by the media management software. This log does not contain Oracle Database or RMAN errors.

Media manager log file

Third-party media management software

The file names for any media manager logs other than sbtio.log are determined by the media management software.

Contains information about the functioning of the media management device

24.1.2 Troubleshooting Long-Running RMAN Operations

RMAN message output provides information about the progress of backup and recovery operations. Use this information to take any required actions to troubleshoot operations that are stuck or awaiting resources.

Certain operations such as backup, restore, recovery, and duplication for large databases typically take a long time to complete. However, it is not always clear if the operation is progressing or waiting on some resources. Starting with Oracle Database Release 18c, RMAN message output contains additional logging information that indicates if a job is waiting on resources. Every 10 minutes, RMAN checks if there is a change in the number of blocks processed. If there no change in the blocks processed, then RMAN displays a message with the associated wait event.

The following is an example of the RMAN output for a RESTORE operation:

allocated channel: c1
channel c1: SID=123 device type=SBT_TAPE
channel c1: WARNING: Oracle Test Disk API

Starting restore at 18-JAN-18

channel c1: starting datafile backup set restore
channel c1: specifying datafile(s) to restore from backup set
channel c1: restoring datafile 00002 to /ade/b/2776899351/oracle/dbs/tbs_ax1.f
channel c1: reading from backup piece 01sov1t4_1_1

***** Hang Detected ***** at 2018-01-18 04:11:23 for channel c1, INSTID: 1, SID: 123, serial: 35831
No change in read blocks, thus showing wait event[Total blocks = 192000, Blocks read/recovered = 41530]
Seq_No   Event                            Waiting Time(mirco secs)
602      Backup: MML read backup piece    38094371

***** Hang Detected ***** at 2018-01-18 04:11:33 for channel c1, INSTID: 1, SID: 123, serial: 35831
No change in read blocks, thus showing wait event[Total blocks = 192000, Blocks read/recovered = 41530]
Seq_No   Event                            Waiting Time(mirco secs)
602      Backup: MML read backup piece    48106104

channel c1: piece handle=01sov1t4_1_1 tag=TAG20180118T040804
channel c1: restored backup piece 1
channel c1: restore complete, elapsed time: 00:02:35
Finished restore at 18-JAN-18
released channel: c1

The output indicates that the restore was stuck because of a problem with a media manager read operation. After the read operation completed, the RMAN restore was successful.

24.1.3 Recognizing RMAN Error Message Stacks

RMAN reports errors as they occur. If an error is not retrievable, that is, if RMAN cannot perform failover to another channel to complete a particular job step, then RMAN also reports a summary of the errors after all job sets complete. This feature is known as deferred error reporting.

One way to determine whether RMAN encountered an error is to examine its return code. A second way is to search the RMAN output for the string RMAN-00569, which is the message number for the error stack banner. All RMAN errors are preceded by this error message. If you do not see an RMAN-00569 message in the output, then there are no errors.

Example 24-1 RMAN Syntax Error

This example shows an RMAN syntax error. The RMAN-00569 message is followed by other error messages that indicate the reason for the error.

RMAN-00571: ===========================================================
RMAN-00569: =============== ERROR MESSAGE STACK FOLLOWS ===============
RMAN-00571: ===========================================================
RMAN-00558: error encountered while parsing input commands
RMAN-01005: syntax error: found ")": expecting one of: "archivelog, backup,
 backupset, controlfilecopy, current, database, datafile, datafilecopy, (, plus, ;, tablespace"
RMAN-01007: at line 1 column 18 file: standard input

24.1.4 Identifying RMAN Error Codes

You can use the error codes in RMAN message stacks to troubleshoot problems with RMAN commands.

Typically, you find the following types of error codes in RMAN message stacks:

  • Errors prefixed with RMAN-

    These are RMAN errors.

  • Errors prefixed with ORA-

    Media manager errors use the ORA- prefix.

  • Errors preceded by the line Additional information:

See Also:

  • RMAN Error Message Numbers for the error ranges of RMAN errors

  • ORA-19511: Media Manager Errors for the error ranges of media manager errors

  • Oracle Database Error Messages Referencefor explanations of RMAN and ORA error codes

24.1.4.1 RMAN Error Message Numbers

RMAN error messages are prefixed with RMAN-.

The following table indicates the error ranges for common RMAN error messages, all of which are described in Oracle Database Error Messages Reference.

Table 24-2 RMAN Error Message Ranges

Error Range Cause

0550-0999

Command-line interpreter

1000-1999

Keyword analyzer

2000-2999

Syntax analyzer

3000-3999

Main layer

4000-4999

Services layer

5000-5499

Compilation of RESTORE or RECOVER command

5500-5999

Compilation of DUPLICATE command

6000-6999

General compilation

7000-7999

General execution

8000-8999

PL/SQL programs

9000-9999

Low-level keyword analyzer

10000-10999

Server-side execution

11000-11999

Interphase errors between PL/SQL and RMAN

12000-12999

Recovery catalog packages

24.1.4.2 ORA-19511: Media Manager Errors

If a media manager error occurs, ORA-19511 is signaled, and the media manager is expected to provide RMAN a descriptive error. RMAN displays the error passed back to it by the media manager.

For example, you might see this:

ORA-19511: Error received from media manager layer, error text:
   sbtpvt_open_input: file .* does not exist or cannot be accessed, errno = 2

The message from the media manager should provide you with enough information to let you fix the root problem. If it does not, then refer to the documentation for your media manager or contact your media management vendor support representative for further information. ORA-19511 errors originate with the media manager, not with Oracle Database. The database just passes on the message from the media manager. The cause can be addressed only by the media management vendor.

If you are still using an SBT 1.1-compliant media management layer, you may see some additional error message text. Output from an SBT 1.1-compliant media management layer is similar to the following:

ORA-19507: failed to retrieve sequential file, handle="c-140148591-20031014-06", parms=""
ORA-27007: failed to open file
Additional information: 7000
Additional information: 2
ORA-19511: Error received from media manager layer, error text:
   SBT error = 7000, errno = 0, sbtopen: backup file not found

The «Additional information» provided uses error codes specific to SBT 1.1. The values displayed correspond to the media manager message numbers and error text listed in Table 24-3. RMAN again signals the error, as an ORA-19511 Error received from media manager layer error, and a general error message related to the error code returned from the media manager and including the SBT 1.1 error number is then displayed.

The SBT 1.1 error messages are listed here for your reference. Table 24-3 lists media manager message numbers and their corresponding error text. In the error codes, O/S stands for operating system. The errors marked with an asterisk (*) are internal and are not typically seen during normal operation.

Table 24-3 Media Manager Error Message Ranges

Cause No. Message

sbtopen

7000

7001

7002*

7003

7004

7005

7006

7007

7008

7009

7010

7011

7012*

Backup file not found (only returned for read)

File exists (only returned for write)

Bad mode specified

Invalid block size specified

No tape device found

Device found, but busy; try again later

Tape volume not found

Tape volume is in-use

I/O Error

Can’t connect with Media Manager

Permission denied

O/S error for example malloc, fork error

Invalid argument(s) to sbtopen

sbtclose

7020*

7021*

7022

7023

7024*

7025

Invalid file handle or file not open

Invalid flags to sbtclose

I/O error

O/S error

Invalid argument(s) to sbtclose

Can’t connect with Media Manager

sbtwrite

7040*

7041

7042

7043

7044*

Invalid file handle or file not open

End of volume reached

I/O error

O/S error

Invalid argument(s) to sbtwrite

sbtread

7060*

7061

7062

7063

7064

7065*

Invalid file handle or file not open

EOF encountered

End of volume reached

I/O error

O/S error

Invalid argument(s) to sbtread

sbtremove

7080

7081

7082

7083

7084

7085

7086*

Backup file not found

Backup file in use

I/O Error

Can’t connect with Media Manager

Permission denied

O/S error

Invalid argument(s) to sbtremove

sbtinfo

7090

7091

7092

7093

7094

7095*

Backup file not found

I/O Error

Can’t connect with Media Manager

Permission denied

O/S error

Invalid argument(s) to sbtinfo

sbtinit

7110*

7111

Invalid argument(s) to sbtinit

O/S error

24.1.5 Interpreting RMAN Error Stacks

It is important to identify the relevant messages in the RMAN error stack.

Note the following tips and suggestions while interpreting RMAN messages:

  • Read the messages from the bottom up, because this is the order in which RMAN issues the messages. The last one or two errors displayed in the stack are often the most informative.

  • When you are using an SBT 1.1 media management layer and you are presented with SBT 1.1 style error messages containing the «Additional information:» numeric error codes, look for the ORA-19511 message that follows for the text of error messages passed back to RMAN by the media manager. These messages identify the real failure in the media management layer.

  • Look for the RMAN-03002 or RMAN-03009 message (RMAN-03009 equals RMAN-03002 but includes the channel ID), immediately following the error banner. These messages indicate which command failed. Syntax errors generate RMAN-00558.

  • Identify the basic type of error according to the error range chart in Table 24-2 and then refer to the error messages for information about the most important messages.

See Also:

  • Interpreting RMAN Errors: Example and Interpreting Server Errors: Example for examples of RMAN error messages

  • Interpreting SBT 2.0 Media Management Errors: Example and Interpreting SBT 1.1 Media Management Errors: Example for examples of interpreting media management errors

  • Oracle Database Error Messages for information about the error messages

24.1.5.1 Interpreting RMAN Errors: Example

Errors prefixed by RMAN- indicate errors caused by RMAN commands.

You attempt a backup of tablespace users and receive the following message:

Starting backup at 29-AUG-13
using channel ORA_DISK_1
RMAN-00571: ===========================================================
RMAN-00569: =============== ERROR MESSAGE STACK FOLLOWS ===============
RMAN-00571: ===========================================================
RMAN-03002: failure of backup command at 08/29/2013 15:14:03
RMAN-20202: tablespace not found in the recovery catalog
RMAN-06019: could not translate tablespace name "USESR"

The RMAN-03002 error indicates that the BACKUP command failed. You read the last two messages in the stack first and immediately see the problem: no tablespace users appears in the recovery catalog because you mistyped the name as usesr.

24.1.5.2 Interpreting Server Errors: Example

Errors from the server are prefixed with ORA-.

Assume that you attempt to recover a tablespace and receive the following errors:

RMAN> RECOVER TABLESPACE users;

Starting recover at 29-AUG-13
using channel ORA_DISK_1

starting media recovery
media recovery failed
RMAN-00571: ===========================================================
RMAN-00569: =============== ERROR MESSAGE STACK FOLLOWS ===============
RMAN-00571: ===========================================================
RMAN-03002: failure of recover command at 08/29/2013 15:18:43
RMAN-11003: failure during parse/execution of SQL statement: alter database recover if needed tablespace USERS
ORA-00283: recovery session canceled due to errors
ORA-01124: cannot recover data file 8 - file is in use or recovery
ORA-01110: data file 8: '/oracle/oradata/trgt/users01.dbf'

As suggested, you start reading from the bottom up. The ORA-01110 message explains there was a problem with the recovery of data file users01.dbf. The second error indicates that the database cannot recover the data file because it is in use or being recovered. The remaining RMAN errors indicate that the recovery session was canceled due to the server errors. Hence, you conclude that because you were not recovering this data file, the problem must be that the data file is online and you must take it offline and restore a backup.

24.1.5.3 Interpreting SBT 2.0 Media Management Errors: Example

This example shows how to interpret errors caused at the media manager level.

Assume that you use a tape drive and see the following output during a backup job:

RMAN-00571: ===========================================================
RMAN-00569: =============== ERROR MESSAGE STACK FOLLOWS ===============
RMAN-00571: ===========================================================
ORA-19624: operation failed, retry possible
ORA-19507: failed to retrieve sequential file, handle="/tmp/mydir", parms=""
ORA-27029: skgfrtrv: sbtrestore returned error
ORA-19511: Error received from media manager layer, error text:
  sbtpvt_open_input:file /tmp/mydir does not exist or cannot be accessed, errno=2

The error text displayed following the ORA-19511 error is generated by the media manager and describes the real source of the failure. See the media manager documentation to interpret this error.

24.1.5.4 Interpreting SBT 1.1 Media Management Errors: Example

This example shows the output of a backup job that has errors media management errors.

Assume that you use a tape drive and see the following output during a backup job:

RMAN-00571: ===========================================================
RMAN-00569: =============== ERROR MESSAGE STACK FOLLOWS ===============
RMAN-00571: ===========================================================
RMAN-03009: failure of backup command on c1 channel at 09/04/2013 13:18:19
ORA-19506: failed to create sequential file, name="07d36ecp_1_1", parms=""
ORA-27007: failed to open file
SVR4 Error: 2: No such file or directory
Additional information: 7005
Additional information: 1
ORA-19511: Error received from media manager layer, error text:
   SBT error = 7005, errno = 2, sbtopen: system error

The main information of interest returned by SBT 1.1 media managers is the error code in the «Additional information» line:

Additional information: 7005

Referring to Table 24-3, you discover that error 7005 means that the media management device is busy. So, the media management software is not able to write to the device because it is in use or there is a problem with it.

Note:

The sbtio.log contains information written by the media management software, not Oracle Database. Thus, you must consult your media vendor documentation to interpret the error codes and messages. If no information is written to the sbtio.log, then contact your media manager support to ask whether they are writing error messages in some other location, or whether there are steps you must take to have the media manager errors appear in sbtio.log.

24.1.6 Identifying RMAN Return Codes

One way to determine whether RMAN encountered an error is to examine its return code or exit status. The RMAN client returns 0 to the shell from which it was invoked if no errors occurred, and a nonzero error value otherwise.

How you access this return code depends upon the environment from which you invoked the RMAN client. For example, if you run UNIX with the C shell, then, when RMAN completes, the return code is placed in a shell variable called $status. The method of returning exit status is a detail specific to the host operating system rather than the RMAN client.

24.2 Using V$ Views for RMAN Troubleshooting

When LIST, REPORT, and SHOW do not provide all the information that you need for RMAN operations, some V$ views can provide useful details.

Sometimes it is useful to identify exactly what a server session performing a backup and recovery job is doing. The views described in the following table are useful for obtaining information about RMAN jobs.

Table 24-4 Useful V$ Views for Troubleshooting

View Description

V$PROCESS

Identifies currently active processes

V$SESSION

Identifies currently active sessions. Use this view to determine which database server sessions correspond to which RMAN allocated channels.

V$SESSION_WAIT

Lists the events or resources for which sessions are waiting

You can use the preceding views to perform the following tasks:

  • Monitoring RMAN Interaction with the Media Manager

  • Correlating Server Sessions with RMAN Channels

24.2.1 Monitoring RMAN Interaction with the Media Manager

You can use the event names in the dynamic performance event views to monitor RMAN calls to the media management API. The event names have one-to-one correspondence with SBT functions.

See the following example:

Backup: MML v1 open backup piece
Backup: MML v1 read backup piece
Backup: MML v1 write backup piece
Backup: MML v1 query backup piece
Backup: MML v1 delete backup piece
Backup: MML v1 close backup piece
.
.
.

To obtain the complete list of SBT events, you can use the following query:

SELECT NAME 
FROM   V$EVENT_NAME 
WHERE  NAME LIKE '%MML%';

Before making a call to any of functions in the media management API, the server adds a row in V$SESSION_WAIT, with the STATE column including the string WAITING. The V$SESSION_WAIT.SECONDS_IN_WAIT column shows the number of seconds that the server has been waiting for this call to return. After an SBT function is returned from the media manager, this row disappears.

A row in V$SESSION_WAIT corresponding to an SBT event name does not indicate a problem, because the server updates these rows at run time. The rows appear and disappear as calls are made and returned. However, if the SECONDS_IN_WAIT column is high, then the media manager may be suspended.

To monitor the SBT events, you can run the following SQL query:

COLUMN EVENT FORMAT a17
COLUMN SECONDS_IN_WAIT FORMAT 999
COLUMN STATE FORMAT a15
COLUMN CLIENT_INFO FORMAT a30

SELECT p.SPID, s.EVENT, s.SECONDS_IN_WAIT AS SEC_WAIT, 
       sw.STATE, s.CLIENT_INFO
FROM   V$SESSION_WAIT sw, V$SESSION s, V$PROCESS p
WHERE  sw.EVENT LIKE '%MML%'
AND    s.SID=sw.SID
AND    s.PADDR=p.ADDR;

Examine the SQL output to determine which SBT functions are waiting. For example, the following output indicates that RMAN has been waiting for the sbtbackup function to return for 10 minutes:

SPID EVENT             SEC_WAIT   STATE           CLIENT_INFO
---- ----------------- ---------- --------------- ------------------------------
8642 Backup: MML creat 600        WAITING         rman channel=ORA_SBT_TAPE_1

Note:

The V$SESSION_WAIT view shows only database events, not media manager events.

24.2.2 Correlating Server Sessions with RMAN Channels

24.2.2.1 Matching Server Sessions with Channels When One RMAN Session Is Active

When only one RMAN session is active, the easiest method for determining the server session ID for an RMAN channel is to query the target database.

Run the following query on the target database while the RMAN job is executing:

COLUMN CLIENT_INFO FORMAT a30
COLUMN SID FORMAT 999
COLUMN SPID FORMAT 9999

SELECT s.SID, p.SPID, s.CLIENT_INFO
FROM   V$PROCESS p, V$SESSION s
WHERE  p.ADDR = s.PADDR
AND    CLIENT_INFO LIKE 'rman%';

The following shows sample output:

 SID SPID         CLIENT_INFO
---- ------------ ------------------------------
  14 8374         rman channel=ORA_SBT_TAPE_1

If you set an ID using the RMAN SET COMMAND ID command instead of using the system-generated default ID, then search for that value in the CLIENT_INFO column instead of 'rman%'.

24.2.2.2 Matching Server Sessions with Channels in Multiple RMAN Sessions

If multiple RMAN sessions are active, then the V$SESSION.CLIENT_INFO column can yield the same information for a channel in each session.

For example:

 SID SPID         CLIENT_INFO
---- ------------ ------------------------------
  14 8374         rman channel=ORA_SBT_TAPE_1
   9 8642         rman channel=ORA_SBT_TAPE_1

In this case, you have the following methods for determining which channel corresponds to which SID value.

24.2.2.2.1 Obtaining the Channel ID from the RMAN Output

You must first obtain the sid values from the RMAN output and then use these values in your SQL query.

To correlate a process with a channel during a backup:

  1. In an active session, run the RMAN job as usual and examine the output to get the SID for the channel. For example, the output may show:
    Starting backup at 21-AUG-13
    allocated channel: ORA_SBT_TAPE_1
    channel ORA_SBT_TAPE_1: sid=14 devtype=SBT_TAPE
    
  2. Start a SQL*Plus session and then query the joined V$SESSION and V$PROCESS views while the RMAN job is executing. For example, enter:
    COLUMN CLIENT_INFO FORMAT a30
    COLUMN SID FORMAT 999
    COLUMN SPID FORMAT 9999
    
    SELECT s.SID, p.SPID, s.CLIENT_INFO
    FROM   V$PROCESS p, V$SESSION s
    WHERE  p.ADDR = s.PADDR
    AND    CLIENT_INFO LIKE 'rman%'
    /
    

    Use the sid value obtained from the first step to determine which channel corresponds to which server session:

           SID SPID         CLIENT_INFO
    ---------- ------------ ------------------------------
            14 2036         rman channel=ORA_SBT_TAPE_1
            12 2066         rman channel=ORA_SBT_TAPE_1

24.2.2.2.2 Correlating Server Sessions with Channels by Using SET COMMAND ID

You specify a command ID string in the RMAN backup script. You can then query V$SESSION.CLIENT_INFO for this string.

To correlate a process with a channel during a backup:

  1. In each session, set the COMMAND ID to a different value after allocating the channels and then back up the desired object. For example, enter the following in session 1:
    RUN 
    {
      ALLOCATE CHANNEL c1 TYPE disk;
      SET COMMAND ID TO 'sess1';
      BACKUP DATABASE;
    }
    

    Set the command ID to a string such as sess2 in the job running in session 2:

    RUN 
    {
      ALLOCATE CHANNEL c1 TYPE sbt;
      SET COMMAND ID TO 'sess2';
      BACKUP DATABASE;
    }
    
  2. Start a SQL*Plus session and then query the joined V$SESSION and V$PROCESS views while the RMAN job is executing. For example, enter:
    SELECT SID, SPID, CLIENT_INFO 
    FROM   V$PROCESS p, V$SESSION s 
    WHERE  p.ADDR = s.PADDR 
    AND    CLIENT_INFO LIKE '%id=sess%';
    

    If you run the SET COMMAND ID command in the RMAN job, then the CLIENT_INFO column displays in the following format:

    id=command_id,rman channel=channel_id
    

    For example, the following shows sample output:

     SID SPID         CLIENT_INFO
    ---- ------------ ------------------------------
      11 8358         id=sess1
      15 8638         id=sess2
      14 8374         id=sess1,rman channel=c1
       9 8642         id=sess2,rman channel=c1
    

    The rows that contain the string rman channel show the channel performing the backup. The remaining rows are for the connections to the target database.

24.3 Testing the Media Management API

24.3.1 Obtaining the sbttest Utility

The default location of the sbttest utility depends on the platform.

On UNIX, the sbttest utility is typically located in $ORACLE_HOME/bin. If for some reason the utility is not included with your platform, then contact Oracle Support Services to obtain the C version of the program. You can compile this version of the program on all UNIX platforms.

On platforms such as Solaris, you do not have to relink when using sbttest. On other platforms, relinking may be necessary.

24.3.2 Obtaining Online Documentation for the sbttest Utility

Use the sbttest command, without arguments, to list the various arguments for this program.

For online documentation of sbttest, issue the following on the command line:

% sbttest

The program displays the list of possible arguments for the program:

Error: backup file name must be specified
Usage: sbttest backup_file_name # this is the only required parameter
               <-dbname database_name>
               <-trace trace_file_name>
               <-remove_before>
               <-no_remove_after> 
               <-read_only>
               <-no_regular_backup_restore>
               <-no_proxy_backup>
               <-no_proxy_restore>
               <-file_type n>
               <-copy_number n>
               <-media_pool n>
               <-os_res_size n>
               <-pl_res_size n>
               <-block_size block_size> 
               <-block_count block_count>
               <-proxy_file os_file_name bk_file_name 
                           [os_res_size pl_res_size block_size block_count]>
               <-libname sbt_library_name>

The display also indicates the meaning of each argument. For example, following is the description for two optional parameters:

Optional parameters:
  -dbname  specifies the database name which will be used by SBT
           to identify the backup file. The default is "sbtdb"
  -trace   specifies the name of a file where the Media Management 
           software will write diagnostic messages.

24.3.3 Using the sbttest Utility

Use sbttest to perform a quick test of the media manager.

If sbttest returns 0, then the test ran without error, which means that the media manager is correctly installed and can accept a data stream and return the same data when requested. If sbttest returns a nonzero value, then either the media manager is not installed or it is not configured correctly.

To use sbttest:

  1. Confirm that the program is installed and included in the system path by typing sbttest at the command line:
    % sbttest
    

    If the program is operational, then you see a display of the online documentation.

  2. Execute the program, specifying any of the arguments described in the online documentation. For example, enter the following to create test file some_file.f and write the output to sbtio.log:
    % sbttest some_file.f -trace sbtio.log
    

    You can also test a backup of an existing data file. For example, this command tests data file tbs_33.f of database prod:

    % sbttest tbs_33.f -dbname prod
    
  3. Examine the output. If the program encounters an error, then it provides messages describing the failure. For example, if the database cannot find the library, you see:
    libobk.so could not be loaded. Check that it is installed properly, and that
     LD_LIBRARY_PATH environment variable (or its equivalent on your platform)
     includes the directory where this file can be found. Here is some additional
     information on the cause of this error:
    ld.so.1: sbttest: fatal: libobk.so: open failed: No such file or directory
    

In some cases, sbttest can work but an RMAN backup does not. The reasons can be the following:

  • The user who starts sbttest is not the owner of the Oracle Database processes.

  • If the database server is not linked with the media management library or cannot load it dynamically when needed, then RMAN backups to the media manager fail, but sbttest may still work.

  • The sbttest program passes all environment parameters from the shell but RMAN does not.

24.4 Terminating an RMAN Command

There are several ways to terminate an RMAN command in the middle of execution.

They include the following:

  • The preferred method is to press Ctrl+C (or the equivalent «attention» key combination for your system) in the RMAN interface. This also terminates allocated channels, unless they are suspended in the media management code, as happens when, for example, they are waiting for a tape to be mounted.

  • You can end the server session corresponding to the RMAN channel by running the SQL ALTER SYSTEM KILL SESSION statement as described in Terminating the Session with ALTER SYSTEM KILL SESSION.

  • You can terminate the server session corresponding to the RMAN channel on the operating system as described in Terminating the Session at the Operating System Level.

24.4.1 Terminating the Session with ALTER SYSTEM KILL SESSION

To terminate an RMAN session by using the ALTER SYSTEM statement, you need the Oracle session ID for the RMAN channel and the serial number. This information is contained in the RMAN log for messages.

Search for messages with the format shown in the following example:

channel ch1: sid=15 devtype=SBT_TAPE

The sid and devtype are displayed for each allocated channel. The Oracle Database sid is different from the operating system process ID. You can end the session using a SQL ALTER SYSTEM KILL SESSION statement.

ALTER SYSTEM KILL SESSION takes two arguments, the sid printed in the RMAN message and a serial number, both of which can be obtained by querying V$SESSION.

For example, run the following statement, where sid_in_rman_output is the number from the RMAN message:

SELECT SERIAL# 
FROM   V$SESSION 
WHERE  SID=sid_in_rman_output;

Then, run the following statement, substituting the sid_in_rman_output and serial number obtained from the query:

ALTER SYSTEM KILL SESSION 'sid_in_rman_output,serial#';

This statement has no effect on the session if the session stopped in media manager code.

24.4.2 Terminating the Session at the Operating System Level

Finding and terminating the processes that are associated with the server sessions is operating system-specific. On some platforms, the server sessions are not associated with any processes at all. See your operating system-specific documentation for more information.

24.4.3 Terminating an RMAN Session That Is Not Responding in the Media Manager

You may sometimes need to terminate an RMAN job that is not responding in the media manager. The best way to terminate RMAN when the channel connections are not responding in the media manager is to terminate the session in the media manager.

If this action does not solve the problem, then on some platforms, such as Linux, you may be able to terminate the Oracle Database processes of the connections. (Terminating the Oracle processes may cause problems with the media manager. See your media manager documentation for details.)

24.4.3.1 Components of an RMAN Session

The nature of an RMAN session depends on the operating system.

In UNIX, an RMAN session has the following processes associated with it:

  • The RMAN client process itself

  • The default channel, the initial connection to the target database

  • One target connection to the target database corresponding to each allocated channel

  • The catalog connection to the recovery catalog database, if you use a recovery catalog

  • An auxiliary connection to an auxiliary instance, during DUPLICATE or TSPITR operations

  • A polling connection to the target database, used for monitoring RMAN command execution on the various allocated channels. By default, RMAN makes one polling connection. RMAN makes additional polling connections if you use different connect strings in the ALLOCATE CHANNEL or CONFIGURE CHANNEL commands. One polling connection exists for each distinct connect string used in the ALLOCATE CHANNEL or CONFIGURE CHANNEL command.

24.4.3.2 Process Behavior During a Suspended Job

RMAN usually stops responding because a channel connection is waiting in the media manager code for a tape resource. The catalog connection and the default channel appear to suspend, because they are waiting for RMAN to tell them what to do. Polling connections seem to be in an infinite loop while polling the RPC under the control of the RMAN process.

If you terminate the RMAN process itself, then you also terminate the catalog connection, the auxiliary connection, the default channel, and the polling connections. If target and auxiliary connections are suspended but not while executing media manager code, they also terminate. If either the target connection or any of the auxiliary connections are executing in the media management layer, then they do not terminate until the processes are manually terminated at the operating system level.

Not all media managers can detect the termination of the Oracle Database process. Those which cannot may keep resources busy or continue processing. Consult your media manager documentation for details.

Terminating the catalog connection does not cause the RMAN process to terminate because RMAN is not performing catalog operations while the backup or restore is in progress. Removing default channel and polling connections causes the RMAN process to detect that a channel is no longer present and then to exit. In this case, the connections to the unresponsive channels remain active as described previously.

24.4.3.3 Terminating an RMAN Session: Basic Steps

After the unresponsive channels in the media manager code are terminated, the RMAN process detects this termination and exits, removing all connections except target connections that are still operative in the media management layer.

The warning about the media manager resources still applies in this case.

To terminate an Oracle Database process that is not responding in the media manager:

  1. Query V$SESSION and V$SESSION_WAIT, as described in «Using V$ Views for RMAN Troubleshooting». For example, execute the following query:
    COLUMN EVENT FORMAT a17
    COLUMN SECONDS_IN_WAIT FORMAT 999
    COLUMN STATE FORMAT a10
    COLUMN CLIENT_INFO FORMAT a30
    
    SELECT p.SPID, s.EVENT, s.SECONDS_IN_WAIT AS SEC_WAIT, 
           sw.STATE, s.CLIENT_INFO
    FROM   V$SESSION_WAIT sw, V$SESSION s, V$PROCESS p
    WHERE  sw.EVENT LIKE '%MML%'
    AND    s.SID=sw.SID
    AND    s.PADDR=p.ADDR;
    

    Examine the SQL output to determine which SBT functions are waiting. For example, the output may be as follows:

    SPID EVENT             SEC_WAIT   STATE      CLIENT_INFO
    ---- -----------------  ---------- ---------- -----------------------------
    8642 Backup:MML write   600        WAITING    rman channel=ORA_SBT_TAPE_1
    8374 Backup:MML write   600        WAITING    rman channel=ORA_SBT_TAPE_2
    
  2. Using operating system-level tools appropriate to your platform, end the unresponsive sessions. For example, on Linux execute a kill -9 command:
    % kill -9 8642 8374
    

    Some platforms include a command-line utility called orakill that enables you to terminate a specific thread. From a command prompt, run the following command, where sid identifies the database instance to target, and the thread_id is the SPID value from the query in Step 1:

    orakill sid thread_id
    
  3. Check that the media manager also clears its processes. If any remain, the next backup or restore operation may freeze again, due to the previous problems in the backup or restore operation. In some media managers, the only solution is to shut down and restart the media manager. If the documentation from the media manager does not provide the needed information, contact technical support for the media manager.

    See Also:

    Your operating system-specific documentation for the relevant commands

Добрый день. Мне посоветовали использовать RMAN и вот я решил испробовать его в деле. Хорошая вещь конечно но на одном этапе возникла проблема которая никак не решается.
Имя базы — exp. Я прочел много инфы в нете и вот что сделал:
1. Перебил и расширил папку для бэкапов. Было
SQL> show parameter db_recovery_file_dest;

NAME TYPE VALUE
———————————— ———— ——————————
db_recovery_file_dest string C:oracleproduct10.2.0flash_recovery_area
db_recovery_file_dest_size big integer 1G

Далее
SQL> alter system set db_recovery_file_dest=’c:bkp’ SCOPE=BOTH SID=’exp’;

System altered

SQL> alter system set db_recovery_file_dest_size=2G SCOPE=BOTH SID=’exp’;

System altered

Получили:

SQL> show parameter db_recovery_file_dest;

NAME TYPE VALUE
———————————— ———— ——————————
db_recovery_file_dest string c:bkp
db_recovery_file_dest_size big integer 2G

Вроде все хорошо. Далее узнал что есть такая штука как snapshot controlfile и его тоже на всякий случай переделал.
RMAN> configure snapshot controlfile name to ‘c:bkpSNCFEXP.ORA’;

snapshot control file name set to: c:bkpSNCFEXP.ORA
new RMAN configuration parameters are successfully stored

Как видите папка c:bkp будет содержать все бэкапы (кроме архивников)

RMAN> show all;

RMAN configuration parameters are:
CONFIGURE RETENTION POLICY TO REDUNDANCY 1; # default
CONFIGURE BACKUP OPTIMIZATION OFF; # default
CONFIGURE DEFAULT DEVICE TYPE TO DISK; # default
CONFIGURE CONTROLFILE AUTOBACKUP ON;
CONFIGURE CONTROLFILE AUTOBACKUP FORMAT FOR DEVICE TYPE DISK TO ‘c:bkpctl_%F’;
CONFIGURE DEVICE TYPE DISK BACKUP TYPE TO COPY PARALLELISM 1;
CONFIGURE DATAFILE BACKUP COPIES FOR DEVICE TYPE DISK TO 1; # default
CONFIGURE ARCHIVELOG BACKUP COPIES FOR DEVICE TYPE DISK TO 1; # default
CONFIGURE MAXSETSIZE TO UNLIMITED; # default
CONFIGURE ENCRYPTION FOR DATABASE OFF; # default
CONFIGURE ENCRYPTION ALGORITHM ‘AES128’; # default
CONFIGURE ARCHIVELOG DELETION POLICY TO NONE; # default
CONFIGURE SNAPSHOT CONTROLFILE NAME TO ‘C:BKPSNCFEXP.ORA’;

Теперь начинаю бэкап. Для этого запускаю свой скрипт:

C:>rman target sys/bak123@exp @’c:bkpfull_backup.txt’

Recovery Manager: Release 10.2.0.1.0 — Production on Tue Sep 27 14:36:27 2011

Copyright (c) 1982, 2005, Oracle. All rights reserved.

connected to target database: EXP (DBID=2455611900)

RMAN> run {
2> configure controlfile autobackup on;
3> allocate channel c1 device type disk format ‘c:bkp%U’;
4> SQL ‘alter system archive log current’;
5> backup database; #plus archivelog;
6> release channel c1;
7> }
8>
using target database control file instead of recovery catalog
old RMAN configuration parameters:
CONFIGURE CONTROLFILE AUTOBACKUP ON;
new RMAN configuration parameters:
CONFIGURE CONTROLFILE AUTOBACKUP ON;
new RMAN configuration parameters are successfully stored

allocated channel: c1
channel c1: sid=132 devtype=DISK

sql statement: alter system archive log current

Starting backup at 27-SEP-11
channel c1: starting datafile copy
input datafile fno=00001 name=C:ORACLEPRODUCT10.2.0ORADATAEXPSYSTEM01 .DBF
output filename=C:BKPDATA_D-EXP_I-2455611900_TS-SYSTEM_FNO-1_ANMNJQJJ tag=TAG20110927T143634 recid=273 stamp=762964622
channel c1: datafile copy complete, elapsed time: 00:00:35
channel c1: starting datafile copy
input datafile fno=00003 name=C:ORACLEPRODUCT10.2.0ORADATAEXPSYSAUX01 .DBF
output filename=C:BKPDATA_D-EXP_I-2455611900_TS-SYSAUX_FNO-3_AOMNJQKM tag=TAG20110927T143634 recid=274 stamp=762964671
channel c1: datafile copy complete, elapsed time: 00:00:45
channel c1: starting datafile copy
input datafile fno=00005 name=C:ORACLEPRODUCT10.2.0ORADATAEXPEXAMPLE0 1.DBF
output filename=C:BKPDATA_D-EXP_I-2455611900_TS-EXAMPLE_FNO-5_APMNJQM3 tag=TAG20110927T143634 recid=275 stamp=762964684
channel c1: datafile copy complete, elapsed time: 00:00:15
channel c1: starting datafile copy
input datafile fno=00002 name=C:ORACLEPRODUCT10.2.0ORADATAEXPUNDOTBS0 1.DBF
output filename=C:BKPDATA_D-EXP_I-2455611900_TS-UNDOTBS1_FNO-2_AQMNJQMI tag=TAG20110927T143634 recid=276 stamp=762964696
channel c1: datafile copy complete, elapsed time: 00:00:07
channel c1: starting datafile copy
input datafile fno=00006 name=C:ORACLEPRODUCT10.2.0ORADATAMYINDEXES.DB F
output filename=C:BKPDATA_D-EXP_I-2455611900_TS-MYINDEXES_FNO-6_ARMNJQMP tag=TAG20110927T143634 recid=277 stamp=762964699
channel c1: datafile copy complete, elapsed time: 00:00:03
channel c1: starting datafile copy
input datafile fno=00004 name=C:ORACLEPRODUCT10.2.0ORADATAEXPUSERS01. DBF
output filename=C:BKPDATA_D-EXP_I-2455611900_TS-USERS_FNO-4_ASMNJQMT tag=TAG20110927T143634 recid=278 stamp=762964701
channel c1: datafile copy complete, elapsed time: 00:00:01
Finished backup at 27-SEP-11

Starting Control File and SPFILE Autobackup at 27-SEP-11
piece handle=C:BKPCTL_C-2455611900-20110927-04 comment=NONE
Finished Control File and SPFILE Autobackup at 27-SEP-11

released channel: c1

Recovery Manager complete.

Вроде все прекрасно. Но я знаю что возможно произошла ошибка поэтому на всякий случай проверяю то что вообще-то мало кто делает

RMAN> restore controlfile validate;

Starting restore at 27-SEP-11
using target database control file instead of recovery catalog
allocated channel: ORA_DISK_1
channel ORA_DISK_1: sid=134 devtype=DISK

channel ORA_DISK_1: starting validation of datafile backupset
channel ORA_DISK_1: reading from backup piece C:BKPCTL_C-2455611900-20110927-04
channel ORA_DISK_1: restored backup piece 1
piece handle=C:BKPCTL_C-2455611900-20110927-04 tag=TAG20110927T143822
channel ORA_DISK_1: validation complete, elapsed time: 00:00:01
Finished restore at 27-SEP-11

RMAN>

То есть контрольный бэкап проверен и все с ним в порядке. С радостью начинаю гадить в своей БД )). Я стираю вот тут прошу обратить внимание я полностью стираю папку oradata из C:oracleproduct10.2.0 где у меня хранилась БД. Затем начинаю восстановление:

C:>rman target sys/bak123@exp

Recovery Manager: Release 10.2.0.1.0 — Production on Tue Sep 27 14:45:37 2011

Copyright (c) 1982, 2005, Oracle. All rights reserved.

connected to target database (not started)

RMAN> startup nomount

Oracle instance started

Total System Global Area 293601280 bytes

Fixed Size 1248624 bytes
Variable Size 104858256 bytes
Database Buffers 184549376 bytes
Redo Buffers 2945024 bytes

RMAN> restore controlfile;

Starting restore at 27-SEP-11
using target database control file instead of recovery catalog
allocated channel: ORA_DISK_1
channel ORA_DISK_1: sid=157 devtype=DISK

RMAN-00571: ================================================== =========
RMAN-00569: =============== ERROR MESSAGE STACK FOLLOWS ===============
RMAN-00571: ================================================== =========
RMAN-03002: failure of restore command at 09/27/2011 14:46:39
RMAN-06563: control file or SPFILE must be restored using FROM AUTOBACKUP

Вот тут уже идет первая гадость потому что по книжке если указан для autobackup в скрипте он должен был сам его найти. Ну да ладно продолжаю:

RMAN> restore controlfile from autobackup;

Starting restore at 27-SEP-11
using channel ORA_DISK_1

recovery area destination: c:bkp
database name (or database unique name) used for search: EXP
channel ORA_DISK_1: no autobackups found in the recovery area
autobackup search outside recovery area not attempted because DBID was not set
RMAN-00571: ================================================== =========
RMAN-00569: =============== ERROR MESSAGE STACK FOLLOWS ===============
RMAN-00571: ================================================== =========
RMAN-03002: failure of restore command at 09/27/2011 14:46:58
RMAN-06172: no autobackup found or specified handle is not a valid copy or piece

RMAN>

Тогда пытаюсь дать ему имя бэкапа контрольного файла

RMAN> restore controlfile from ‘CTL_C-2455611900-20110927-04’;

Starting restore at 27-SEP-11
using channel ORA_DISK_1

RMAN-00571: ================================================== =========
RMAN-00569: =============== ERROR MESSAGE STACK FOLLOWS ===============
RMAN-00571: ================================================== =========
RMAN-03002: failure of restore command at 09/27/2011 14:47:55
RMAN-06172: no autobackup found or specified handle is not a valid copy or piece

Тогда я тыкаю этот файл ему в морду

RMAN> restore controlfile from ‘c:bkpCTL_C-2455611900-20110927-04’;

Starting restore at 27-SEP-11
using channel ORA_DISK_1

channel ORA_DISK_1: restoring control file
RMAN-00571: ================================================== =========
RMAN-00569: =============== ERROR MESSAGE STACK FOLLOWS ===============
RMAN-00571: ================================================== =========
RMAN-03002: failure of restore command at 09/27/2011 14:48:17
ORA-19870: error reading backup piece C:BKPCTL_C-2455611900-20110927-04
ORA-19504: failed to create file «C:ORACLEPRODUCT10.2.0ORADATAEXPCONTROL01.CT L»
ORA-27040: file create error, unable to create file
OSD-04002: unable to open file
O/S-Error: (OS 3) The system cannot find the path specified.

RMAN>

Но он даже не видит этот файл! Но ведь RMAN сам проверял файл CTL_C-2455611900-20110927-04 и сам сказал что он в порядке! Вообще какая-то загадка!
Могу сказать две вещи — один раз каким-то боком восстановление прошло нормально но дальше все встало. Второе — если не стирать папку oradata то восстановление проходит нормально. Вообщем я не могу понять что не то с этим бэкапом и просьба помочь найти мне ошибку.

__________________
Помощь в написании контрольных, курсовых и дипломных работ, диссертаций здесь

Понравилась статья? Поделить с друзьями:
  • Error message payment failed please contact your bank or choose another payment method please con
  • Error message ovpnagent request error
  • Error message ora 00904
  • Error message maker
  • Error message main error message reason