The following recovery scenario is for a very specific case. It requires a comprehensive knowledge of the application schema design and the relationship between the different tables. It is not by any means valid for all block corruption cases. It not by any means a replacement for the regular database backups. Having a good valid backup is always the life safer option. The recovery options are always function of the backups that you have. Use the below procedure with care and at your own responsibility. Take the necessary precaution steps (like full database OS cold backup). Please also note that in this specific case, the valid backups got replaced by a corrupted backups and the client discovered this at a very late time. Also, please note that “Fully recover” in the post title doesn’t mean “RMAN full database recovery”, but it means fully recovering the database from this bad corruption state.
My client had a power outage, that resulted in corrupted blocks
At 28-Sep around 7am, my client had a power outage for some time. After restarting the database and application server, end users started to get some application errors that they are not able to update some fields across many screens.
Initially, my client thought that these errors are application errors and started to investigate them from the application side for two days, but he was not able to fix them.
My client has a daily scheduled full database backup at 5am.
Reviewing the database logs revealed that there are three corrupted blocks, one of them is for a table block and the other two blocks are index blocks.
ORA-01578: ORACLE data block corrupted (file # 5,block # 594248)
ORA-01110: data file 5: ‘/u01/app/oracle/oradata/prod/df02.dbf’
I sorted out the index corrupted blocks by recreating the indexes.
Now coming to the table corrupted block in datafile #5:
Backup of 28-Sep-2021 was completed successfully.
Backup of 29-Sep-2021 was failed once reaching the corrupted block.
In other words, backup of data files 1,2,3 & 4 was completed successfully. For datafiles # 5, the backup proceeded until reaching the corrupted block, and then the rman job failed and didn’t complete the backup of the other datafiles. So, we have partial backup of data file #5 at 29-Sep-2021.
Backup of 30-Sep-2021 was failed once reaching the corrupted block.
Backup of 01-OCT-2021 was failed once reaching the corrupted block.
Backup of 02-OCT-2021 was failed once reaching the corrupted block.
My client has a backup retention policy of “Redundancy 2”. So, now for datafile # 5, we have the last two partial invalid backups created at 01-OCT-2021 & 02-OCT-2021 and all the valid backups before 29-Sep-2021 were replaced based on the backup retention policy.
The key point here is that considered and counted the failed backups of data file # 5 and started to use them to replace the older valid backups.
So, now we don’t have any valid backups for data file # 5 !!!!!
So, now if I tried to recover the corrupted block using the following command, it will fail:
RECOVER DATAFILE 5 BLOCK 594248;
At this point, we can conclude that we cannot recover this corrupted block any more, but wait for a moment.
With some more investigation, I found that I can select two columns (out of 5 columns) from the table containing the corrupted block, using a query like:
select install_id,branch_id FROM df.call_rec where branch_id=10 and dbms_rowid.rowid_block_number(ROWID)=594248;
Also, I found that I can get the remaining 3 columns by linking with other table.
So, in this very specific case, I can logically (without physical rman recovery) recover all the lost rows.
Recovery Action Plan:
- Use the dbms_repair package to skip the corrupted block, so as end users stop seeing the ORA-01578 errors.
- Build a temp table with the rows similar to that contained in the corrupted blocks (291 rows).
- Insert the rows from the temp table back to the original table.
Let us implement this plan:
1- Create the package required backend tables:
conn / as sysdba
EXECUTE dbms_repair.admin_tables(table_name => ‘ORPHAN_KEY_TABLE’, table_type => dbms_repair.orphan_table, action => dbms_repair.create_action, TABLESPACE => ‘users’);
EXECUTE dbms_repair.admin_tables(table_name => ‘REPAIR_TABLE’, table_type => dbms_repair.repair_table, action => dbms_repair.create_action, TABLESPACE => ‘users’);
2- Check the corrupted object:
SET serveroutput ON
dbms_repair.check_object(schema_name => ‘DF’, object_name => ‘CALL_REC’, repair_table_name => ‘REPAIR_TABLE’, corrupt_count => corrupt_count);
dbms_output.put_line(‘Corrupted block(s): ‘ || TO_CHAR (corrupt_count));
Corrupted block(s): 1
col repair_description FOR a30
SET lines 150
SELECT relative_file_id,block_id,schema_name,object_name,fix_timestamp,repair_description FROM sys.repair_table;
RELATIVE_FILE_ID BLOCK_ID SCHEMA_NAME OBJECT_NAME FIX_TIMES REPAIR_DESCRIPTION
—————- ————————– ————– ————————————————-
5 594248 DF CALL_REC mark block software corrupt
— Ask Oracle to skip the corrupted blocks
EXEC DBMS_REPAIR.SKIP_CORRUPT_BLOCKS(schema_name => ‘DF’, object_name => ‘CALL_REC’);
— Capture the corrupted blocks in a temp table, update them from INST_COLLECTION, insert them back into call_rec table:
create table df.call_rec_lost_rows_br10 as select * from df.call_rec where 1=0;
insert into df.call_rec_lost_rows_br10 (install_id,branch_id) select install_id,branch_id FROM df.call_rec where branch_id=10 and dbms_rowid.rowid_block_number(ROWID)=594248;
291 rows created.
select * from df.call_rec_lost_rows_br10; ==> 291 row
update df.call_rec_lost_rows_br10 rc set call_date = (select call_date,comment, user from df.INST_COLLECTION i where branch_id=10 and i.install_id = rc.install_id );
— Insert corrected/recovered 291 rows into the original table:
insert into df.call_rec select * from df.call_rec_lost_rows_br10;
291 rows created.
— Rebuild the table indexes:
alter index df.rec_inst rebuild online;
alter index df.rec_inst_bra rebuild online;