Exadata Data-Base Machine: July 2016

Wednesday, July 20, 2016

ASM Datafile Move

Move ASM database files from one diskgroup to another

In the past, I have been involved in a migration of a standalone database to RAC with ASM through data pump export and import. After the migration was done, one of the ASM diskgroup is more than 98% percent used, which triggerred alerts from BMC Patrol, which is a standard monitoring tool in my working environment.

Searching My Oracle Support website, I found the following note:

How to move ASM database files from one diskgroup to another ? (Doc ID 330103.1)

The note is pretty straight forword, however, there was a one suprise when I tried to offline a datafile :

ORA-01145: offline immediate disallowed unless media recovery enabled

That was because I put the db in noarchivelog mode in order to speed up the data pump import jobs. After I configured the archivelog mode, thing went fine.

Here are the steps I followed for future reference

1) Identify the data file to be moved

+DATA_01/roeprdsc/datafile/non_ttq_clb_msg_01.dbf            NON_TTQ_CLB_MSG
2) Identify the diskgroup on to which the file has to be moved.

+DATA_09/roeprdsc/datafile

3) Take the file offline.

ALTER DATABASE DATAFILE '+DATA_01/roeprdsc/datafile/non_ttq_clb_msg_01.dbf' OFFLINE;

4. Copy the data file

I used DBMS_FILE_TRANSFER package. We can also use RMAN, detail see the note.


create or replace directory orcl1 as '+DATA_01/roeprdsc/datafile';

create or replace directory orcl2 as '+DATA_09/roeprdsc/datafile';

Note, if need to create a directory in a diskgroup, syntax looks like

ASM> Alter disgroup asmdsk2 add directory '+asmdsk2/test';

BEGIN
DBMS_FILE_TRANSFER.COPY_FILE(
source_directory_object => 'ORCL1',
source_file_name => 'non_ttq_clb_msg_01.dbf',
destination_directory_object => 'ORCL2',
destination_file_name => 'non_ttq_clb_msg_01.dbf');
END;
/

5) Rename the file to point to new location.

ALTER DATABASE RENAME FILE '+DATA_01/roeprdsc/datafile/non_ttq_clb_msg_01.dbf' TO
'+DATA_09/roeprdsc/datafile/non_ttq_clb_msg_01.dbf' ;

6) Recover the file.

  RECOVER DATAFILE '+DATA_09/roeprdsc/datafile/non_ttq_clb_msg_01.dbf'

7) Bring the file online.

ALTER DATABASE DATAFILE '+DATA_09/roeprdsc/datafile/non_ttq_clb_msg_01.dbf' online;

8) Verify the new file location.

SQL> SELECT FILE_NAME FROM DBA_DATA_FILES where file_name like '%clb_msg%';

FILE_NAME
--------------------------------------------------------------------------------
...
+DATA_09/roeprdsc/datafile/non_ttq_clb_msg_01.dbf
+DATA_01/roeprdsc/datafile/non_ttq_clb_msg_02.dbf
+DATA_01/roeprdsc/datafile/non_ttq_clb_msg_03.dbf
+DATA_01/roeprdsc/datafile/non_ttq_clb_msg_04.dbf
+DATA_01/roeprdsc/datafile/non_ttq_clb_msg_05.dbf
..

9) Delete the file from its original location either per SQLPLUS or per ASMCMD:

Syntax examples:

SQL:ASM> ALTER DISKGROUP ASMDSK2 DROP FILE users.256.565313879;

or:   ASMCMD> rm -rf

Note:
Most Automatic Storage Management files do not need to be manually deleted because, as Oracle managed files, they are removed automatically when they are no longer needed. ( To not to have the alert come again, I have to do this)

Database Migration

Data Pump Export and Import for Database Migration

Sometime back we had to carry out a test for using Data Pump Export and Import utilities to migrate a 17 TB database from Solaris to Linux platform, also from single-instance, conventional file system to a two-node RAC on ASM. During the exercise I have learned and become more familiar with quite a few interesting features or options of Data Pump Export and Import utilites and encountered a few difficulties as well.

The first obstacle I met was that I was unable to do Data Pump Export to the NFS storage. We have the NFS storage that is shared between the Solaris and Linux server. At first I tried to perform Data Pump Export to the NFS storage directly. Unfortunately, the job was just stuck there for long time. Having researched this issue for a while and tried the options mentioned in a MOS note (See References [1]) with the help of a SA, I just could not get this worked out. Finally we decided to export to the local file system first then move the dump files to the NFS. I was able to move about 150 GB dump file per hour. Fortunately, the Data Pump Import from NFS storage did not pose any problems.

Table mode Data Pump Export and Import were used in the test. Large tables were typically exported or imported individually and smaller tables were grouped together into several batches. For each large table or a group of smaller tables, I prepared the corresponding parameter files with parallel option if appropriate. In the Import job, I always excluded the index, constraint and statistics. I used SQLFILE option of the impdp command to extract the DDL commands related to the tables, including create table, index and constraint statements etc. Those DDLs were executed after the Import with parallel option for index creation and "ENABLE NOVALIDATE" for constraints typically. Therefore the database migration actually consisted of multiple tables migration jobs. For each tables migration, I performed and recorded the timing of the following steps: export, move dump file to NFS, import, DDLs for index etc. Those tables migration jobs were overlapped as two or three of them were executed at the same time and could be in different steps.

Shortly after the onset of the test I found out that Data Pump Export was extremely slow on a big table with lob column. The export job was unable to run in parallel regardless of the parallel setting. The segment sizes of this big table are 237 GB and 382 GB for table and lob respectively. The table uses a sequence-based PK column called "EMAIL_SEQ_ID". To speed up the export, I started 23 export jobs at the same time, each job exported a particular range of rows based on the PK. The key is to use the QUERY option for the expdp command. The QUERY option contains a WHERE clause that specifies the PK range. Finally I was able to export this table in 30 hours, move the 420 GB dump files in 2 hours, import in about 15 hours and execute DDL for indexes etc in 2 hours. It is worth noting that although we can perform the lob table export job in parallel by such an approach, we cannot do the same for import. The import job will hold a TM lock for the table, so only one import job can run at a time. I used this approach for several big lob tables. Appendix showed example Shell script to prepare expdp or impdp parfile. By the way, later I found that unable to export in parallel for lob table is a known issue in the Oracle community. (see Reference [3]). A blog post suggested using rowid based approach to separate the rows of lob tables. (see Reference [4])

Occasionally I made mistakes and needed to terminate an export or import job. I found it very convenient to use KILL_JOB command in the interactive mode. First, I need to know the job name either from log file or from dba_datapump_jobs view. Then I can enter the interactive mode using attach option:

expdp / attach=[job_name]

Then I issued "KILL_JOB" command. Note when using KILL_JOB to the expdb, all dump files will be wiped out.

During the test, I have also learned that Data Pump Import can be done through db link. When specifying a source database link for the NETWORK_LINK option of impdp command, import job will retrieve data from the db link, and write the data directly to the target database There are no dump files involved.

I have mentioned previously SQLFILE option can be used to extract DDLs. But this option only available for impdp. Suppose I have a table in a database and I want to extract DDLs related to this table including indexes, constraints, object grants, comments, triggers etc associated with it, how should I do? Previously I always use DBMS_METADATA package for such tasks. If we create a db link to the database itself, we can use impdp with NETWORK_LINK and SQLFILE options to accomplish it very neatly.

Using export/import to migrate the database has one advantage over Transportable Tablespace approach at least: it allows the re-organization of tables and indexes. i.e. it is easy to move tables and indexes to different tablespaces if desired.

I completed the 17 TB database migration with the size of about 12 TB in the target database in about a month. The test was interrupted by other tasks with higher priority. I did total 17 tables migrations to complete the whole database migration. Ordered by end-to-end time (including export, move to NFS, import, DDL steps), they were refered to as as batch 1 to 17 here:

batch   time (hours) 
-----   ----------- 
1        50
2        30
3        25
4        24
5        21
6        16
7        15
8        14
9        11
10       10
11       10
12       9.5
13       5.5
14       5
15       3.5
16       3
17       3
------------

Based on above timing, if I perform the migration task as continuely as possible, I may be able to compete it in 5-7 days. The ultimate goal is to do a zero-down time migration. We plan to use oracle Golden Gate to caputure the changes at source during the database migration. So my next task will be investiagating whether this is possible in practise in our environment.

References:
[1] 781349.1 - ORA-27054: NFS file system where the file is created or resides is not mounted with correct options [ID 781349.1])
[2] Sample Export and Import parameter files:

 -- expdp.par  ----


 DIRECTORY=dpump_dir3
 DUMPFILE=dpump_dir3:email%U_q2.dmp
 TABLES=TTQ_GMPX.TTQ_EMAIL_EVENT_DETAILS
 LOGFILE=dpump_dir3:TTQ_EMAIL_EVENT_DETAILS_q2.log
 #PARALLEL=20
 FILESIZE=5G
 JOB_NAME=exp_email_q2
 QUERY="WHERE email_seq_id >=2*200000000   and email_seq_id < 3*200000000"


  -- impdp.par ----

 DIRECTORY=dpump_dir
 DUMPFILE=dpump_dir:email%U_q2.dmp
 TABLES=TTQ_GMPX.TTQ_EMAIL_EVENT_DETAILS
 TABLE_EXISTS_ACTION=APPEND
 LOGFILE=dpump_dir:TTQ_EMAIL_EVENT_DETAILS_q2_imp.log
 PARALLEL=2
 JOB_NAME=imp_email_q2
 EXCLUDE=index,constraint,statistics
 #EXCLUDE=statistics
 transform=storage:n,oid:n
 CLUSTER=NO

[3] Bug 5599947 - Export Data Pump is slow when table has a LOB column - Defect: Bug:5599947 "DATAPUMP EXPORT VERY SLOW"

[4] http://jensenmo.blogspot.com/2012/10/optimising-data-pump-export-and-import.html

[5] Some other references:
Master Note for Data Pump [ID 1264715.1]
Checklist For Slow Performance Of DataPump Export (expdp) And Import (impdp) [ID 453895.1]
Parallel Capabilities of Oracle Data Pump [ID 365459.1]

Appendix - sample shell script to generate expdp/impdp par file for big lob table

#!/bin/ksh
i=21
while [[ $i -le 43 ]];do
 
  ((j=i+1))
  PARFILE="EMAIL_CONTENT_q$i.par"
  echo "DIRECTORY=dpump_dir3" > $PARFILE 
  echo "DUMPFILE=dpump_dir3:ecntnt_q${i}_%U.dmp" >> $PARFILE 
  echo "TABLES=TTQ_GMPX.EMAIL_CONTENT" >> $PARFILE
  echo "LOGFILE=dpump_dir3:EMAIL_CONTENT_q$i.log" >> $PARFILE 
  echo "FILESIZE=10G" >> $PARFILE
  echo "JOB_NAME=exp_ecntnt_q$i" >> $PARFILE
  echo "QUERY=\"WHERE email_seq_id > $i * 2000000  and email_seq_id <= $j * 2000000\"">> $PARFILE
 
 
  echo "i=$i"
  echo "j=$j"
  ((i=i+1))
done

--- sample script to generate impdp par file for big lob table -----------------

i=6
while [[ $i -le 43 ]];do
 
  ((j=i+1))
  PARFILE="EMAIL_CONTENT_imp_q$i.par"
  echo "DIRECTORY=dpump_dir" > $PARFILE 
  echo "DUMPFILE=ecntnt_q${i}_%U.dmp" >> $PARFILE 
  echo "TABLES=TTQ_GMPX.EMAIL_CONTENT" >> $PARFILE
  echo "TABLE_EXISTS_ACTION=APPEND" >> $PARFILE
  echo "LOGFILE=EMAIL_CONTENT_q${i}_imp.log" >> $PARFILE 
  echo "JOB_NAME=imp_ecntnt_q${i}" >> $PARFILE
  echo "#EXCLUDE=index,constraint,statistics" >> $PARFILE
  echo "EXCLUDE=statistics" >> $PARFILE 
  echo "transform=storage:n,oid:n" >> $PARFILE
  echo "CLUSTER=NO" >> $PARFILE
 
 
  echo "i=$i"
  echo "j=$j"
  ((i=i+1))
done

Using Rowid

Update selected rows of a large table through rowid

We have a requirement to update a date column of about 14M rows in a large table of billions of rows . The challenge is we have limited downtime to perform such a task.
In a Sun SPARC T4-2 machine with 128 virtual CPUs, we tested two different join methods with parallel executions:

-- nested loop took about 8h30
UPDATE TTQ_DQG.DQG_PRODUCT b
    SET billing_end_date = SYSDATE
    where exists (select /*+ parallel (a 12) */ 1 from DBSCRUB.ISOC_MAPPING_5M a where b.bo_id=a.old_bo_id);

-- hash join took about 5h
UPDATE /*+ parallel (b 12) */TTQ_DQG.DQG_PRODUCT b
    SET billing_end_date = SYSDATE
    where exists (select 1 from DBSCRUB.ISOC_MAPPING_5M a where b.bo_id=a.old_bo_id)
   ;

The 5h execution time still does not meet our downtime requirement. I proposed to perform the task in two steps.

(1) create a rowid driver table outside downtime window
(2) update the table with rowid in the downtime window

The test in the same machine shows we can do it in 41 min!
SQL>
SQL> --Step 1. Create a rowid driver table several hours before downtime
SQL>
SQL> CREATE table dbscrub.dqg_product_5m_rowid nologging
2              AS
3              select /*+ full(a) parallel( a 10) */ a.rowid rid from TTQ_DQG.dqg_product a
4              where exists ( select 1 from dbscrub.ISOC_MAPPING_5M b where a.bo_id=b.old_bo_id);
Table created.
Elapsed: 00:22:01.71
SQL>
SQL>
SQL>
SQL>
SQL> --Step 2 update table with rowid access in the downtime window
SQL>
SQL>
SQL> UPDATE TTQ_DQG.CPF_PRODUCT b
2 SET billing_end_date = SYSDATE
3 where b.rowid in (select /*+ parallel (a, 12) */ rid from DBSCRUB.dqg_product_5m_rowid a)
4 ;
14431117 rows updated.
Elapsed: 00:41:34.28
SQL>
SQL>
SQL> spool off

Is this approach safe, any concerns? You may check this AskTom thread:
http://asktom.oracle.com/pls/asktom/f?p=100:11:0::::P11_QUESTION_ID:53140678334596

Oracle Exadata Deployment Assistance (OEDA)

Steps to Generate Exadata One Command Configuration sheet

First download Oracle Exadata Deployment Assistance from below link as per your respective Operating System.

Oracle Exadata Deployment Assistance (OEDA)

Install it on your machine and click on config to open deployment assistance.

Below are the detailed steps to fill the configuration details.

Click to enlarge.....

Provide organization name, application details, timezone with NTP and DNS Server details.

Select appropriate Exadata Configuration Rack, in our case it's Half Rack High Performance.

Provide network details for Administration and Client connectivity. You should have separate network for admin and client as per best practice.

Provide starting IP for admin network. Here in modify tab you can modify hostname if required.

Provide starting IP for client network, here also you can modify client hostname if required.

Infiniband network is private inter connectivity between server and storage so you can keep default IPs.

If you have seperate backup network then you can provide details for backup network.

Select Operating system from Linux/Solaris. X4 comes with only Linux as of now.

It will give overview of all the details which you have provided.

Define cluster name and components which will be the part of cluster.

Provide Oracle User, Group, Oracle Version and Oracle Home related details. Also you have to mention ASM disk group name and storage allocation.

It will give the overview of cluster details.

Optional

Optional: ASR details

Optional

Comments is optional

Generate the configuration sheet by saving the details to one folder.

It will generate below files after finishing the configuration sheet. Provide these details to Oracle to configure your Exadata. Get these files for reference HERE

Tuesday, July 19, 2016

How To Erase Data Securely On Exadata?

When disposing of an old hard drive, physically destroy the drive or completely erase all the data on the drive. Deleting the files or reformatting the drive removes only the address tables on the drive.

The information can still be recovered from a drive after deleting files or reformatting the drive.

To overcome from this we can securely erase and clean physical disk before redeployment of it using Oracle Exadata Storage Server Software. There are multiple options to overwrites existing contents on the disks like one pass, three passes or seven passes.

1pass option overwrites contents with zeros. This option is not applicable for flash drives.

3pass option overwrites with set data patterns. It follows the recommendations from NNSA (United States National Nuclear Security Administration ). This option is also not applicable for flash drives.

7pass option also overwrites with set data patterns. It follows the recommendations from DOD.

Below tables shows approximate time needed to securely erase data using above options.

Source: www.oracle.com

How we can use this option while erasing data?

The CellCLI DROP CELLDISK command includes an option to securely erase data by overwriting the data. If Exadata Storage Server drives contain sensitive data that needs to be erased for redeployment or another purpose, then the secure erase feature should be used on the storage cell.

The ERASE option ensures that all data is overwritten with random data, and erased up to seven times. This ensures that the data cannot be recovered, and that the data is permanently erased.

Erase data using Drop CELLDISK

Syntax:

DROP CELLDISK { ALL [FLASHDISK | HARDDISK] | cdisk_name [,disk_name]... } [ERASE = value [NOWAIT]] [FORCE]

Purpose:

This command is necessary if a cell disk fails, or it is replaced by a newer model.

Before dropping the cell disk, you should drop its grid disks and the corresponding Oracle ASM disks from the disk groups.

The Oracle ASM disks should be dropped before dropping the grid disks.

Usage:

If the ALL option is specified, then all the cell disks on the cell are removed

If individual cell disks are specified, then the named cell disks (cdisk_name) are dropped.

When dropping all cell disks using the 1pass or 3pass option, it necessary to drop the flash disks first using the 7pass option, and then drop the cell disks

Use the NOWAIT option with the ERASE option to run the command asynchronously

If the LUN associated with the CELLDISK is flagged as automatically created, then that LUN is deleted along with the cell disk.

Example:

CellCLI> DROP CELLDISK CD_03_cell01

CellCLI> DROP CELLDISK CD_02_cell06 FORCE

CellCLI> DROP CELLDISK ALL

CellCLI> DROP CELLDISK CD_02_cell09 ERASE=1pass NOWAIT

CellDisk CD_02_cell09 erase is in progress

Erase data using Drop Griddisk

Purpose:

The DROP GRIDDISK command removes the named grid disks from the cell or removes all the grid disks specified by the ALL PREFIX option.

Syntax:

DROP GRIDDISK {ALL [FLASHDISK | HARDDISK ] PREFIX=gdisk_name_prefix , | gdisk_name [, gdisk_name]... } [ERASE = value [NOWAIT]] [FORCE]

Usage:

If the gdisk_name is entered, then the name identifies the individual grid disk to be removed. Multiple names can be entered.

The FLASHDISK option limits the DROP GRIDDISK command to grid disks that are flash disks.

The HARDDISK option limits the DROP GRIDDISK command to grid disks that are hard disks.

The ERASE option erases the content on the disk by overwriting the content

Use the NOWAIT option with the ERASE option to run the command asynchronously

When dropping all grid disks using the 1pass or 3pass option, it necessary to drop the flash disks first using the 7pass option.

Example:

CellCLI> ALTER GRIDDISK data01_CD_03_cell01 INACTIVE

CellCLI> DROP GRIDDISK data01_CD_03_cell01

CellCLI> DROP GRIDDISK ALL PREFIX=data01

CellCLI> DROP GRIDDISK data02_CD_04_cell01 FORCE

CellCLI> DROP GRIDDISK data02_CD_04_cell01 ERASE=1pass

GridDisk data02_CD_04_cell01 successfully dropped

CellCLI> DROP GRIDDISK ALL FLASHDISK PREFIX=DATA, ERASE=7pass

CellCLI> DROP GRIDDISK ALL PREFIX=DATA, ERASE=3pass

Check the status of secure Erase:

CellCLI> LIST GRIDDISK

DATA_CD_03_cell1 erase in progress

DATA_CD_04_cell1 erase in progress

DATA_CD_06_cell1 active

DATA_CD_05_cell1 active

You can go through the Oracle Exadata Documentation for more information

11G R2 RAC Quick Reference

Commonly used Commands
******************************

su - grid -c "crs_stat -t -v"
su - grid -c "crsctl status resource -w \"TYPE co 'ora'\" -t"

srvctl start database -d stgtdb
srvctl start database -d stgrat
srvctl start database -d stgrptdb
srvctl status instance -d racdb -i racdb1
srvctl status nodeapps
srvctl config nodeapps
srvctl config database
srvctl config database -d racdb -a
srvctl status asm
srvctl config asm -a
srvctl config listener -a
srvctl status scan
srvctl config scan
srvctl status vip -n racnode1
srvctl status vip -n racnode2
srvctl config vip -n racnode1
srvctl config vip -n racnode2
srvctl config nodeapps -a -g -s -l
srvctl stop database -d racdb
srvctl start database -d racdb

crsctl check cluster
crsctl stop cluster
crsctl stop cluster -all
crsctl start cluster
crsctl start cluster -all
crsctl start cluster -n racnode1 racnode2

cluvfy comp scan -verbose
cluvfy comp clocksync -verbose

SELECT
    inst_id
, instance_number inst_no
, instance_name inst_name
, parallel
, status
, database_status db_status
, active_state state
, host_name host
FROM gv$instance
ORDER BY inst_id;

select name from v$datafile
union
select member from v$logfile
union
select name from v$controlfile
union
select name from v$tempfile;

Checking CRS Status:

The below two commands are generally used to check the status of CRS. The first command lists the status of CRS
on the local node where as the other command shows the CRS status across all the nodes in Cluster.

crsctl check crs <<-- for the local node
crsctl check cluster <<-- for remote nodes in the cluster

[root@node1-pub ~]# crsctl check crs
Cluster Synchronization Services appears healthy
Cluster Ready Services appears healthy
Event Manager appears healthy
[root@node1-pub ~]#

Checking Viability of CSS across nodes:

crsctl check cluster

For this command to run, CSS needs to be running on the local node. The "ONLINE" status for remote node says that CSS is running on that node.
When CSS is down on the remote node, the status of "OFFLINE" is displayed for that node.

[root@node1-pub ~]# crsctl check cluster
node1-pub    ONLINE
node2-pub    ONLINE

Viewing Cluster name:

I use below command to get the name of Cluster. You can also dump the ocr and view the name from the dump file.

ocrdump -stdout -keyname SYSTEM | grep -A 1 clustername | grep ORATEXT | awk '{print $3}'

[root@node1-pub ~]# ocrdump -stdout -keyname SYSTEM | grep -A 1 clustername | grep ORATEXT | awk '{print $3}'
test-crs
[root@node1-pub ~]#

OR

ocrconfig -export /tmp/ocr_exp.dat -s online
for i in `strings /tmp/ocr_exp.dat | grep -A 1 clustername` ; do if [ $i != 'SYSTEM.css.clustername' ]; then echo $i; fi; done

[root@node1-pub ~]# ocrconfig -export /tmp/ocr_exp.dat -s online
[root@node1-pub ~]# for i in `strings /tmp/ocr_exp.dat | grep -A 1 clustername` ; do if [ $i != 'SYSTEM.css.clustername' ]; then echo $i; fi; done
test-crs
[root@node1-pub ~]#

OR

Oracle creates a directory with the same name as Cluster under the $ORA_CRS_HOME/cdata. you can get the cluster name from this directory as well.

[root@node1-pub ~]# ls /u01/app/crs/cdata
localhost test-crs

Viewing No. Of Nodes configured in Cluster:

The below command can be used to find out the number of nodes registered into the cluster.
It also displays the node's Public name, Private name and Virtual name along with their numbers.

olsnodes -n -p -i

[root@node1-pub ~]# olsnodes -n -p -i
node1-pub       1       node1-prv       node1-vip
node2-pub       2       node2-prv       node2-vip

Viewing Votedisk Information:

The below command is used to view the no. of Votedisks configured in the Cluster.

crsctl query css votedisk

[root@node1-pub ~]# crsctl query css votedisk
0.     0    /u02/ocfs2/vote/VDFile_0
1.     0    /u02/ocfs2/vote/VDFile_1
2.     0    /u02/ocfs2/vote/VDFile_2
Located 3 voting disk(s).
[root@node1-pub ~]#

[root@node1-pub ~]# crsctl check crs
Cluster Synchronization Services appears healthy
Cluster Ready Services appears healthy
Event Manager appears healthy
[root@node1-pub ~]#

Viewing OCR Disk Information:

The below command is used to view the no. of OCR files configured in the Cluster. It also displays the version of OCR
as well as storage space information. You can only have 2 OCR files at max.

ocrcheck

[root@node1-pub ~]# ocrcheck
Status of Oracle Cluster Registry is as follows :
Version                  :          2
Total space (kbytes)     :     262120
Used space (kbytes)      :       3848
Available space (kbytes) :     258272
ID                       : 744414276
Device/File Name         : /u02/ocfs2/ocr/OCRfile_0
Device/File integrity check succeeded
Device/File Name         : /u02/ocfs2/ocr/OCRfile_1
Device/File integrity check succeeded

Cluster registry integrity check succeeded

Various Timeout Settings in Cluster:

Disktimeout:
Disk Latencies in seconds from node-to-Votedisk. Default Value is 200. (Disk IO)
Misscount:
Network Latencies in second from node-to-node (Interconnect). Default Value is 60 Sec (Linux) and 30 Sec in Unix platform. (Network IO)
Misscount < Disktimeout

NOTE: Do not change them without contacting Oracle Support. This may cause logical corruption to the Data.

IF
(Disk IO Time > Disktimeout) OR (Network IO time > Misscount)
THEN
REBOOT NODE
ELSE
DO NOT REBOOT
END IF;

crsctl get css disktimeout
crsctl get css misscount
crsctl get css reboottime

[root@node1-pub ~]# crsctl get css disktimeout
200

[root@node1-pub ~]# crsctl get css misscount
Configuration parameter misscount is not defined. <<<<< This message indicates that the Misscount is not set maually and it is set to it's
Default Value On Linux, it is default to 60 Second. If you want to chang it, you can do that as below. (Not recommended)

[root@node1-pub ~]# crsctl set css misscount 100
Configuration parameter misscount is now set to 100.
[root@node1-pub ~]# crsctl get css misscount
100

The below command sets the value of misscount back to its Default values:

crsctl unset css misscount

[root@node1-pub ~]# crsctl unset css misscount

[root@node1-pub ~]# crsctl get css reboottime
3

Add/Remove OCR file in Cluster:

Removing OCR File

(1) Get the Existing OCR file information by running ocrcheck utility.

[root@node1-pub ~]# ocrcheck
Status of Oracle Cluster Registry is as follows :
Version                  :          2
Total space (kbytes)     :     262120
Used space (kbytes)      :       3852
Available space (kbytes) :     258268
ID                       : 744414276
Device/File Name         : /u02/ocfs2/ocr/OCRfile_0 <-- OCR
Device/File integrity check succeeded
Device/File Name         : /u02/ocfs2/ocr/OCRfile_1 <-- OCR Mirror
Device/File integrity check succeeded

Cluster registry integrity check succeeded

(2) The First command removes the OCR mirror (/u02/ocfs2/ocr/OCRfile_1). If you want to remove the OCR
file (/u02/ocfs2/ocr/OCRfile_1) run the next command.

ocrconfig -replace ocrmirror
ocrconfig -replace ocr

[root@node1-pub ~]# ocrconfig -replace ocrmirror
[root@node1-pub ~]# ocrcheck
Status of Oracle Cluster Registry is as follows :
Version                  :          2
Total space (kbytes)     :     262120
Used space (kbytes)      :       3852
Available space (kbytes) :     258268
ID                       : 744414276
Device/File Name         : /u02/ocfs2/ocr/OCRfile_0 <<-- OCR File
Device/File integrity check succeeded

Device/File not configured <-- OCR Mirror not existed any more

Cluster registry integrity check succeeded

Adding OCR

You need to add OCR or OCR Mirror file in a case where you want to move the existing OCR file location to the different Devices.
The below command add ths OCR mirror file if OCR file alread exists.

(1) Get the Current status of OCR:

[root@node1-pub ~]# ocrconfig -replace ocrmirror
[root@node1-pub ~]# ocrcheck
Status of Oracle Cluster Registry is as follows :
Version                  :          2
Total space (kbytes)     :     262120
Used space (kbytes)      :       3852
Available space (kbytes) :     258268
ID                       : 744414276
Device/File Name         : /u02/ocfs2/ocr/OCRfile_0 <<-- OCR File
Device/File integrity check succeeded

Device/File not configured <-- OCR Mirror does not exist

Cluster registry integrity check succeeded

As You can see, I only have one OCR file but not the second file which is OCR Mirror.
So, I can add second OCR (OCR Mirror) as below command.

ocrconfig -replace ocrmirror <File name>

[root@node1-pub ~]# ocrconfig -replace ocrmirror /u02/ocfs2/ocr/OCRfile_1
[root@node1-pub ~]# ocrcheck
Status of Oracle Cluster Registry is as follows :
Version                  :          2
Total space (kbytes)     :     262120
Used space (kbytes)      :       3852
Available space (kbytes) :     258268
ID                       : 744414276
Device/File Name         : /u02/ocfs2/ocr/OCRfile_0
Device/File integrity check succeeded
Device/File Name         : /u02/ocfs2/ocr/OCRfile_1
Device/File integrity check succeeded

Cluster registry integrity check succeeded

You can have at most 2 OCR devices (OCR itself and its single Mirror) in a cluster. Adding extra Mirror gives you below error message

[root@node1-pub ~]# ocrconfig -replace ocrmirror /u02/ocfs2/ocr/OCRfile_2
PROT-21: Invalid parameter
[root@node1-pub ~]#

Add/Remove Votedisk file in Cluster:

Adding Votedisk:

Get the existing Vote Disks associated into the cluster. To be safe, Bring crs cluster stack down on all the nodes
but one on which you are going to add votedisk from.

(1) Stop CRS on all the nodes in cluster but one.

[root@node2-pub ~]# crsctl stop crs

(2) Get the list of Existing Vote Disks

crsctl query css votedisk

[root@node1-pub ~]# crsctl query css votedisk
0.     0    /u02/ocfs2/vote/VDFile_0
1.     0    /u02/ocfs2/vote/VDFile_1
2.     0    /u02/ocfs2/vote/VDFile_2
Located 3 voting disk(s).

(3) Backup the VoteDisk file

Backup the existing votedisks as below as oracle:

dd if=/u02/ocfs2/vote/VDFile_0 of=$ORACLE_BASE/bkp/vd/VDFile_0

[root@node1-pub ~]# su - oracle
[oracle@node1-pub ~]$ dd if=/u02/ocfs2/vote/VDFile_0 of=$ORACLE_BASE/bkp/vd/VDFile_0
41024+0 records in
41024+0 records out
[oracle@node1-pub ~]$

(4) Add an Extra Votedisk into the Cluster:

If it is a OCFS, then touch the file as oracle. On raw devices, initialize the raw devices using "dd" command

touch /u02/ocfs2/vote/VDFile_3 <<-- as oracle
crsctl add css votedisk /u02/ocfs2/vote/VDFile_3 <<-- as oracle
crsctl query css votedisks

[root@node1-pub ~]# su - oracle
[oracle@node1-pub ~]$ touch /u02/ocfs2/vote/VDFile_3
[oracle@node1-pub ~]$ crsctl add css votedisk /u02/ocfs2/vote/VDFile_3
Now formatting voting disk: /u02/ocfs2/vote/VDFile_3.
Successful addition of voting disk /u02/ocfs2/vote/VDFile_3.

(5) Confirm that the file has been added successfully:

[root@node1-pub ~]# ls -l /u02/ocfs2/vote/VDFile_3
-rw-r----- 1 oracle oinstall 21004288 Oct 6 16:31 /u02/ocfs2/vote/VDFile_3
[root@node1-pub ~]# crsctl query css votedisks
Unknown parameter: votedisks
[root@node1-pub ~]# crsctl query css votedisk
0.     0    /u02/ocfs2/vote/VDFile_0
1.     0    /u02/ocfs2/vote/VDFile_1
2.     0    /u02/ocfs2/vote/VDFile_2
3.     0    /u02/ocfs2/vote/VDFile_3
Located 4 voting disk(s).
[root@node1-pub ~]#

Removing Votedisk:

Removing Votedisk from the cluster is very simple. Tthe below command removes the given votedisk from cluster configuration.

crsctl delete css votedisk /u02/ocfs2/vote/VDFile_3

[root@node1-pub ~]# crsctl delete css votedisk /u02/ocfs2/vote/VDFile_3
Successful deletion of voting disk /u02/ocfs2/vote/VDFile_3.
[root@node1-pub ~]#

[root@node1-pub ~]# crsctl query css votedisk
0.     0    /u02/ocfs2/vote/VDFile_0
1.     0    /u02/ocfs2/vote/VDFile_1
2.     0    /u02/ocfs2/vote/VDFile_2
Located 3 voting disk(s).
[root@node1-pub ~]#

Backing Up OCR

Oracle performs physical backup of OCR devices every 4 hours under the default backup direcory $ORA_CRS_HOME/cdata/<CLUSTER_NAME>
and then it rolls that forward to Daily, weekly and monthly backup. You can get the backup information by executing below command.

ocrconfig -showbackup

[root@node1-pub ~]# ocrconfig -showbackup

node2-pub     2007/09/03 17:46:47     /u01/app/crs/cdata/test-crs/backup00.ocr

node2-pub     2007/09/03 13:46:45     /u01/app/crs/cdata/test-crs/backup01.ocr

node2-pub     2007/09/03 09:46:44     /u01/app/crs/cdata/test-crs/backup02.ocr

node2-pub     2007/09/03 01:46:39     /u01/app/crs/cdata/test-crs/day.ocr

node2-pub     2007/09/03 01:46:39     /u01/app/crs/cdata/test-crs/week.ocr
[root@node1-pub ~]#

Manually backing up the OCR

ocrconfig -manualbackup <<--Physical Backup of OCR

The above command backs up OCR under the default Backup directory. You can export the contents of the OCR using below command (Logical backup).

ocrconfig -export /tmp/ocr_exp.dat -s online <<-- Logical Backup of OCR

Restoring OCR

The below command is used to restore the OCR from the physical backup. Shutdown CRS on all nodes.

ocrconfig -restore <file name>

Locate the avialable Backups

[root@node1-pub ~]# ocrconfig -showbackup

node2-pub     2007/09/03 17:46:47     /u01/app/crs/cdata/test-crs/backup00.ocr

node2-pub     2007/09/03 13:46:45     /u01/app/crs/cdata/test-crs/backup01.ocr

node2-pub     2007/09/03 09:46:44     /u01/app/crs/cdata/test-crs/backup02.ocr

node2-pub     2007/09/03 01:46:39     /u01/app/crs/cdata/test-crs/day.ocr

node2-pub     2007/09/03 01:46:39     /u01/app/crs/cdata/test-crs/week.ocr

node1-pub     2007/10/07 13:50:41     /u01/app/crs/cdata/test-crs/backup_20071007_135041.ocr

Perform Restore from previous Backup

[root@node2-pub ~]# ocrconfig -restore /u01/app/crs/cdata/test-crs/week.ocr

The above command restore the OCR from week old backup.
If you have logical backup of OCR (taken using export option), then You can import it with the below command.

ocrconfig -import /tmp/ocr_exp.dat

Restoring Votedisks

Shutdown CRS on all the nodes in Cluster.
Locate the current location of the Votedisks
Restore each of the votedisks using "dd" command from the previous good backup of Votedisk taken using the same "dd" command.
Start CRS on all the nodes.

crsctl stop crs
crsctl query css votedisk
dd if=<backup of Votedisk> of=<Votedisk file> <<-- do this for all the votedisks
crsctl start crs

Changing Public and Virtual IP Address:

Current Config                                               Changed to

Node 1:

Public IP:       216.160.37.154                              192.168.10.11
VIP:             216.160.37.153                              192.168.10.111
subnet:          216.160.37.159                              192.168.10.0
Netmask:         255.255.255.248                             255.255.255.0
Interface used: eth0                                        eth0
Hostname:        node1-pub.hingu.net                         node1-pub.hingu.net

Node 2:

Public IP:       216.160.37.156                              192.168.10.22
VIP:             216.160.37.157                              192.168.10.222
subnet:          216.160.37.159                              192.168.10.0
Netmask:         255.255.255.248                             255.255.255.0
Interface used: eth0                                        eth0
Hostname:        node1-pub.hingu.net                         node2-pub.hingu.net

Take the Services, Database, ASM Instances and nodeapps down on both the Nodes in Cluster.
Also disable the nodeapps, asm and database instances to prevent them from restarting in case
if this node gets rebooted during this process.

srvctl stop service -d test
srvctl stop database -d test
srvctl stop asm -n node1-pub
srvctl stop asm -n node2-pub
srvctl stop nodeapps -n node1-pub,node1-pub2
srvctl disable instance -d test -i test1,test2
srvctl disable asm -n node1-pub
srvctl disable asm -n node2-pub
srvctl disable nodeapps -n node1-pub
srvctl disable nodeapps -n node2-pub

Modify the /etc/hosts and/or DNS, ifcfg-eth0 (local node) with the new IP values
on All the Nodes

Restart the specific network interface in order to use the new IP.

ifconfig eth0 down
ifconfig eth0 up

Or, you can restart the network.
CAUTION: on NAS, restarting entire network may cause the node to be rebooted.

Update the OCR with the New Public IP.
In case of public IP, you have to delete the interface first and then add it back with the new IP address.

As oracle user, Issue the below command:

oifcfg delif -global eth0
oifcfg setif -global eth0/192.168.10.0:public

Update the OCR with the New Virtual IP.
Virtual IP is part of the nodeapps and so you can modify the nodeapps to update the Virtual IP information.

As privileged user (root), Issue the below commands:

srvctl modify nodeapps -n node1-pub -A 192.168.10.111/255.255.255.0/eth0 <-- for Node 1
srvctl modify nodeapps -n node1-pub -A 192.168.10.222/255.255.255.0/eth0 <-- for Node 2

Enable the nodeapps, ASM, database Instances for all the Nodes.

srvctl enable instance -d test -i test1,test2
srvctl enable asm -n node1-pub
srvctl enable asm -n node2-pub
srvctl enable nodeapps -n node1-pub
srvctl enable nodeapps -n node2-pub

Update the listener.ora file on each nodes with the correct IP addresses in case if it uses the IP address instead of the hostname.

Restart the Nodeapps, ASM and Database instance

srvctl start nodeapps -n node1-pub
srvctl start nodeapps -n node2-pub
srvctl start asm -n node1-pub
srvctl start asm -n node2-pub
srvctl start database -d test

SRVCTL is used to manage the following resources (components):
***************************************************************

Component             Abbreviation                 Description
***********************************************************************
asm                            asm                                    Oracle ASM instance
database                  db                                       Database instance
diskgroup                dg                                       Oracle ASM disk group
filesystem              filesystem                       Oracle ASM file system
home                         home                                Oracle home or Oracle Clusterware home
listener                     lsnr                                    Oracle Net listener
service                     serv                                    Database service
ons, eons            ons, eons                          Oracle Notification Services (ONS)
***********************************************************************

Oracle entities (such as resources, resource types, and server pools) that have names beginning with ora are managed only by SRVCTL (and not by CRSCTL) unless you are directed to do so by Oracle Support. The cluster specific commands are generally managed by CRSCTL.

Syntax:

srvctl command object options

The available commands used with SRVCTL are:

Command           Description
add               Adds a component to the Oracle Restart configuration.
config            Displays the Oracle Restart configuration for a component.
disable           Disables management by Oracle Restart for a component.
enable            Reenables management by Oracle Restart for a component.
getenv            Displays environment variables in the Oracle Restart configuration for a database, Oracle ASM instance, or listener.
modify            Modifies the Oracle Restart configuration for a component.
remove            Removes a component from the Oracle Restart configuration.
setenv            Sets environment variables in the Oracle Restart configuration for a database, Oracle ASM instance, or listener.
start             Starts the specified component.
status            Displays the running status of the specified component.
stop              Stops the specified component.
unsetenv          Unsets environment variables in the Oracle Restart configuration for a database, Oracle ASM instance, or listener.

For more information go to the Oracle DBA Guide

Here are a matrix of commands/ object combination:

Commands     Objects     Comment

srvctl add
srvctl modify
srvctl remove


instance

database
service
nodeapps

The OCR is modified.

srvctl relocate

service

You can reallocate a service from one named instance to another named instance.

srvctl start
srvctl stop
srvctl status

instance
database
service
asm
nodeapps


srvctl disable
srvctl enable


instance

database
service
asm


enable = when the server restart the resource must be restarted
disable = when the server restart the resource must NOT be restarted
              (perhaps we are working for some maintenance tasks)

srvctl config
database
service
asm

nodeapps

Lists configuration information from the OCR (Oracle Cluster Registry).

srvctl getenv
srvctl setenv
srvctl unsetenv

instance

database
service
nodeapps

srvctl getenv = displays the environment variables stored in the OCR for target.
srvctl setenv    = allows these variables to be set
srvctl unsetenv = llows these variables to be unset

The most SRVCTL commands are:

srvctl start database -d DBname
srvctl stop database -d DBname

If you don't know the DBname you might run select name from v$database;

srvctl start instance -d DBname -i INSTANCEname
srvctl stop instance -d DBname -i INSTANCEname

srvctl start instance -d DBname -i INSTANCEname
srvctl stop instance -d DBname -i INSTANCEname

srvctl status database -d DBname
srvctl status instance -d DBname -i INSTANCEname
srvctl status nodeapps -n NODEname

srvctl enable database -d DBname
srvctl disable database -d DBname

srvctl enable instance -d DBname -i INSTANCEname
srvctl disable instance -d DBname -i INSTANCEname

srvctl config database -d DBname      -> to get some information about the database from OCR.
srvctl getenv nodeaps