root.sh fails CLSRSC-331: Failure initializing entries in file ‘/etc/oracle/scls_scr/racdb1’

After failed installation of 12gR2 Oracle Cluster Memeber, during the second attempt, I received the following error while running the root.sh script:

[root@racdb1 ~]# /u01/app/12.2.0/grid/root.sh
Performing root user operation.

The following environment variables are set as:
ORACLE_OWNER= grid
ORACLE_HOME= /u01/app/12.2.0/grid

Enter the full pathname of the local bin directory: [/usr/local/bin]:
The contents of “dbhome” have not changed. No need to overwrite.
The contents of “oraenv” have not changed. No need to overwrite.
The contents of “coraenv” have not changed. No need to overwrite.

Entries will be added to the /etc/oratab file as needed by
Database Configuration Assistant when a database is created
Finished running generic part of root script.
Now product-specific root actions will be performed.
Relinking oracle with rac_on option
Using configuration parameter file: /u01/app/12.2.0/grid/crs/install/crsconfig_params
The log of current session can be found at:
/u01/app/grid/crsdata/racdb1/crsconfig/rootcrs_racdb1_2018-05-30_01-00-14PM.log
2018/05/30 13:00:17 CLSRSC-594: Executing installation step 1 of 19: ‘SetupTFA’.
2018/05/30 13:00:17 CLSRSC-4001: Installing Oracle Trace File Analyzer (TFA) Collector.
2018/05/30 13:00:52 CLSRSC-4002: Successfully installed Oracle Trace File Analyzer (TFA) Collector.
2018/05/30 13:00:52 CLSRSC-594: Executing installation step 2 of 19: ‘ValidateEnv’.
2018/05/30 13:01:00 CLSRSC-363: User ignored prerequisites during installation
2018/05/30 13:01:00 CLSRSC-594: Executing installation step 3 of 19: ‘CheckFirstNode’.
2018/05/30 13:01:03 CLSRSC-594: Executing installation step 4 of 19: ‘GenSiteGUIDs’.
2018/05/30 13:01:04 CLSRSC-594: Executing installation step 5 of 19: ‘SaveParamFile’.
2018/05/30 13:01:05 CLSRSC-594: Executing installation step 6 of 19: ‘SetupOSD’.
2018/05/30 13:01:24 CLSRSC-594: Executing installation step 7 of 19: ‘CheckCRSConfig’.
2018/05/30 13:01:24 CLSRSC-594: Executing installation step 8 of 19: ‘SetupLocalGPNP’.
2018/05/30 13:01:26 CLSRSC-594: Executing installation step 9 of 19: ‘ConfigOLR’.
CRS-4046: Invalid Oracle Clusterware configuration.
CRS-4000: Command Create failed, or completed with errors.
2018/05/30 13:01:27 CLSRSC-331: Failure initializing entries in file ‘/etc/oracle/scls_scr/racdb1’
The command ‘/u01/app/12.2.0/grid/perl/bin/perl -I/u01/app/12.2.0/grid/perl/lib -I/u01/app/12.2.0/grid/crs/install /u01/app/12.2.0/grid/crs/install/rootcrs.pl ‘ execution failed

ckptGridHA_racdb1.xml is a check point file. It contains information about the node name, ocr, voting disk location, GRID_HOME, ORCALE_HOME, private interconnect, public and vip IP addresses… and is located at /u01/app/grid/crsdata/racdb1/crsconfig/

Remove the mentioned file:

# rm -rf /u01/app/grid/crsdata/racdb1/crsconfig/ckptGridHA_racdb1.xml

And rerun root.sh

[root@racdb1 ~]# /u01/app/12.2.0/grid/root.sh
Performing root user operation.

The following environment variables are set as:
ORACLE_OWNER= grid
ORACLE_HOME= /u01/app/12.2.0/grid

Enter the full pathname of the local bin directory: [/usr/local/bin]:
The contents of “dbhome” have not changed. No need to overwrite.
The contents of “oraenv” have not changed. No need to overwrite.
The contents of “coraenv” have not changed. No need to overwrite.

Entries will be added to the /etc/oratab file as needed by
Database Configuration Assistant when a database is created
Finished running generic part of root script.
Now product-specific root actions will be performed.
Relinking oracle with rac_on option
Using configuration parameter file: /u01/app/12.2.0/grid/crs/install/crsconfig_params
The log of current session can be found at:
/u01/app/grid/crsdata/racdb1/crsconfig/rootcrs_racdb1_2018-05-30_01-26-56PM.log
2018/05/30 13:26:58 CLSRSC-594: Executing installation step 1 of 19: ‘SetupTFA’.
2018/05/30 13:26:59 CLSRSC-4001: Installing Oracle Trace File Analyzer (TFA) Collector.
2018/05/30 13:26:59 CLSRSC-4002: Successfully installed Oracle Trace File Analyzer (TFA) Collector.
2018/05/30 13:26:59 CLSRSC-594: Executing installation step 2 of 19: ‘ValidateEnv’.
2018/05/30 13:27:01 CLSRSC-363: User ignored prerequisites during installation
2018/05/30 13:27:01 CLSRSC-594: Executing installation step 3 of 19: ‘CheckFirstNode’.
2018/05/30 13:27:03 CLSRSC-594: Executing installation step 4 of 19: ‘GenSiteGUIDs’.
2018/05/30 13:27:03 CLSRSC-594: Executing installation step 5 of 19: ‘SaveParamFile’.
2018/05/30 13:27:07 CLSRSC-594: Executing installation step 6 of 19: ‘SetupOSD’.
2018/05/30 13:27:25 CLSRSC-594: Executing installation step 7 of 19: ‘CheckCRSConfig’.
2018/05/30 13:27:25 CLSRSC-594: Executing installation step 8 of 19: ‘SetupLocalGPNP’.
2018/05/30 13:27:46 CLSRSC-594: Executing installation step 9 of 19: ‘ConfigOLR’.
2018/05/30 13:27:53 CLSRSC-594: Executing installation step 10 of 19: ‘ConfigCHMOS’.
2018/05/30 13:27:53 CLSRSC-594: Executing installation step 11 of 19: ‘CreateOHASD’.
2018/05/30 13:27:57 CLSRSC-594: Executing installation step 12 of 19: ‘ConfigOHASD’.
2018/05/30 13:28:12 CLSRSC-330: Adding Clusterware entries to file ‘oracle-ohasd.service’
2018/05/30 13:29:09 CLSRSC-594: Executing installation step 13 of 19: ‘InstallAFD’.
2018/05/30 13:29:12 CLSRSC-594: Executing installation step 14 of 19: ‘InstallACFS’.
CRS-2791: Starting shutdown of Oracle High Availability Services-managed resources on ‘racdb1’
CRS-2793: Shutdown of Oracle High Availability Services-managed resources on ‘racdb1’ has completed
CRS-4133: Oracle High Availability Services has been stopped.
CRS-4123: Oracle High Availability Services has been started.
2018/05/30 13:30:07 CLSRSC-594: Executing installation step 15 of 19: ‘InstallKA’.
2018/05/30 13:30:10 CLSRSC-594: Executing installation step 16 of 19: ‘InitConfig’.
CRS-2791: Starting shutdown of Oracle High Availability Services-managed resources on ‘racdb1’
CRS-2793: Shutdown of Oracle High Availability Services-managed resources on ‘racdb1’ has completed
CRS-4133: Oracle High Availability Services has been stopped.
CRS-4123: Oracle High Availability Services has been started.
CRS-2672: Attempting to start ‘ora.evmd’ on ‘racdb1’
CRS-2672: Attempting to start ‘ora.mdnsd’ on ‘racdb1’
CRS-2676: Start of ‘ora.mdnsd’ on ‘racdb1’ succeeded
CRS-2676: Start of ‘ora.evmd’ on ‘racdb1’ succeeded
CRS-2672: Attempting to start ‘ora.gpnpd’ on ‘racdb1’
CRS-2676: Start of ‘ora.gpnpd’ on ‘racdb1’ succeeded
CRS-2672: Attempting to start ‘ora.cssdmonitor’ on ‘racdb1’
CRS-2672: Attempting to start ‘ora.gipcd’ on ‘racdb1’
CRS-2676: Start of ‘ora.cssdmonitor’ on ‘racdb1’ succeeded
CRS-2676: Start of ‘ora.gipcd’ on ‘racdb1’ succeeded
CRS-2672: Attempting to start ‘ora.cssd’ on ‘racdb1’
CRS-2672: Attempting to start ‘ora.diskmon’ on ‘racdb1’
CRS-2676: Start of ‘ora.diskmon’ on ‘racdb1’ succeeded
CRS-2676: Start of ‘ora.cssd’ on ‘racdb1’ succeeded
2018/05/30 13:30:53 CLSRSC-482: Running command: ‘/u01/app/12.2.0/grid/bin/ocrconfig -upgrade grid oinstall’
CRS-2672: Attempting to start ‘ora.crf’ on ‘racdb1’
CRS-2672: Attempting to start ‘ora.storage’ on ‘racdb1’
CRS-2676: Start of ‘ora.crf’ on ‘racdb1’ succeeded
CRS-2676: Start of ‘ora.storage’ on ‘racdb1’ succeeded
CRS-2672: Attempting to start ‘ora.crsd’ on ‘racdb1’
CRS-2676: Start of ‘ora.crsd’ on ‘racdb1’ succeeded
Now formatting voting disk: +GRID.
CRS-4256: Updating the profile
Successful addition of voting disk bfc6d4b89dd54fd6bf4f4ca43552da69.
Successfully replaced voting disk group with +GRID.
CRS-4256: Updating the profile
CRS-4266: Voting file(s) successfully replaced
## STATE File Universal Id File Name Disk group
— —– —————– ——— ———
1. ONLINE bfc6d4b89dd54fd6bf4f4ca43552da69 (+GRID/RACDB1/VOTINGFILE/vfile.258.977491875) [GRID]
Located 1 voting disk(s).
CRS-2791: Starting shutdown of Oracle High Availability Services-managed resources on ‘racdb1’
CRS-2673: Attempting to stop ‘ora.crsd’ on ‘racdb1’
CRS-2677: Stop of ‘ora.crsd’ on ‘racdb1’ succeeded
CRS-2673: Attempting to stop ‘ora.crf’ on ‘racdb1’
CRS-2673: Attempting to stop ‘ora.ctssd’ on ‘racdb1’
CRS-2673: Attempting to stop ‘ora.evmd’ on ‘racdb1’
CRS-2673: Attempting to stop ‘ora.storage’ on ‘racdb1’
CRS-2673: Attempting to stop ‘ora.drivers.acfs’ on ‘racdb1’
CRS-2673: Attempting to stop ‘ora.gpnpd’ on ‘racdb1’
CRS-2673: Attempting to stop ‘ora.mdnsd’ on ‘racdb1’
CRS-2677: Stop of ‘ora.drivers.acfs’ on ‘racdb1’ succeeded
CRS-2677: Stop of ‘ora.crf’ on ‘racdb1’ succeeded
CRS-2677: Stop of ‘ora.ctssd’ on ‘racdb1’ succeeded
CRS-2677: Stop of ‘ora.evmd’ on ‘racdb1’ succeeded
CRS-2677: Stop of ‘ora.gpnpd’ on ‘racdb1’ succeeded
CRS-2677: Stop of ‘ora.storage’ on ‘racdb1’ succeeded
CRS-2673: Attempting to stop ‘ora.cssd’ on ‘racdb1’
CRS-2677: Stop of ‘ora.mdnsd’ on ‘racdb1’ succeeded
CRS-2677: Stop of ‘ora.cssd’ on ‘racdb1’ succeeded
CRS-2673: Attempting to stop ‘ora.gipcd’ on ‘racdb1’
CRS-2677: Stop of ‘ora.gipcd’ on ‘racdb1’ succeeded
CRS-2793: Shutdown of Oracle High Availability Services-managed resources on ‘racdb1’ has completed
CRS-4133: Oracle High Availability Services has been stopped.
2018/05/30 13:31:49 CLSRSC-594: Executing installation step 17 of 19: ‘StartCluster’.
CRS-4123: Starting Oracle High Availability Services-managed resources
CRS-2672: Attempting to start ‘ora.mdnsd’ on ‘racdb1’
CRS-2672: Attempting to start ‘ora.evmd’ on ‘racdb1’
CRS-2676: Start of ‘ora.mdnsd’ on ‘racdb1’ succeeded
CRS-2676: Start of ‘ora.evmd’ on ‘racdb1’ succeeded
CRS-2672: Attempting to start ‘ora.gpnpd’ on ‘racdb1’
CRS-2676: Start of ‘ora.gpnpd’ on ‘racdb1’ succeeded
CRS-2672: Attempting to start ‘ora.gipcd’ on ‘racdb1’
CRS-2676: Start of ‘ora.gipcd’ on ‘racdb1’ succeeded
CRS-2672: Attempting to start ‘ora.cssdmonitor’ on ‘racdb1’
CRS-2676: Start of ‘ora.cssdmonitor’ on ‘racdb1’ succeeded
CRS-2672: Attempting to start ‘ora.cssd’ on ‘racdb1’
CRS-2672: Attempting to start ‘ora.diskmon’ on ‘racdb1’
CRS-2676: Start of ‘ora.diskmon’ on ‘racdb1’ succeeded
CRS-2676: Start of ‘ora.cssd’ on ‘racdb1’ succeeded
CRS-2672: Attempting to start ‘ora.cluster_interconnect.haip’ on ‘racdb1’
CRS-2672: Attempting to start ‘ora.ctssd’ on ‘racdb1’
CRS-2676: Start of ‘ora.ctssd’ on ‘racdb1’ succeeded
CRS-2672: Attempting to start ‘ora.crf’ on ‘racdb1’
CRS-2676: Start of ‘ora.crf’ on ‘racdb1’ succeeded
CRS-2672: Attempting to start ‘ora.crsd’ on ‘racdb1’
CRS-2676: Start of ‘ora.crsd’ on ‘racdb1’ succeeded
CRS-2676: Start of ‘ora.cluster_interconnect.haip’ on ‘racdb1’ succeeded
CRS-6023: Starting Oracle Cluster Ready Services-managed resources
CRS-6017: Processing resource auto-start for servers: racdb1
CRS-6016: Resource auto-start has completed for server racdb1
CRS-6024: Completed start of Oracle Cluster Ready Services-managed resources
CRS-4123: Oracle High Availability Services has been started.
2018/05/30 13:33:39 CLSRSC-343: Successfully started Oracle Clusterware stack
2018/05/30 13:33:39 CLSRSC-594: Executing installation step 18 of 19: ‘ConfigNode’.
2018/05/30 13:35:22 CLSRSC-594: Executing installation step 19 of 19: ‘PostConfig’.
2018/05/30 13:35:45 CLSRSC-325: Configure Oracle Grid Infrastructure for a Cluster … succeeded

Advertisements

DNS/NIS name service prereq failed

I was configuring an Oracle Member cluster for Databases. This type of cluster was introduced in 12.2. It has a lot of benefits. To see more information and installation steps for this type of cluster visit Oracle doc site.

Visually it looks like the following:

Domain_Cluster_arch

So during the installtion prerequisite check was complaining about “DNS/NIS name service”

DNS_NIS_Error

If you press details of the error you get a huge message. I won’t paste it’s content here, because it is something like bla bla… bla. Coud not find any meaningfull information there and lost too much time.

I seached on the Internet , still not found any good info.  Then started trying everything that came to my mind and one of them solved it.

So my entries about scan looked like the following:

Scan

Added domain name at the end:

Scan_correct

And my check looks like the following:

Prereq_OK

Please note that there may be several cases when this type of failure appeares. I showed you one of them.

Good Luck!

Exadata: Rebuild RAC clusterware without deleting data Version 2

This post is differrent from my previous post Rebuild RAC clusterware without deleting data . Because two days ago I was upgrading grid infrastructure from 12.1 to 12.2 that was successfull on first node , but failed on second node. I will not describe why this happend, but the whole process was something complicated instead of being simple. We have installed several patches before the installation(gridSetup has this option to indicate patches before installation)… Seems the 12.2 software has many bugs even during upgrade process.(But I agree  with other DBA-s that 12.2 database is very stable itself).

So what happend now is that during first node upgrade OCR files was changed. I tried deconfigure from 12.2 home and it was also failed. So now I am with my clusterware that has corrupted OCR and voting disks(it belongs 12.2 version). In my presious post I was starting clusterware in exclusive mode with nocrs and restoring OCR from backup, but now because of voting disks are different version  it does not starting in even exclusive mode.

So I have followed the steps that recreate diskgroup , where OCR and voting disks are saved. Because it is Exadata Cell Storage disks , it was more complicated than with ordinary disks, where you can cleanup header using “dd”. Instead of dd you use cellcli.

So let’s start:

Connect to each cell server(I have three of them) and drop grid disks that belong to DBFS(it contains OCR and Voting disks). Be careful dropping griddisk causes data to be erased. So DBFS must contain only OCR and Vdisks not !DATA!

#Find the name, celldisk and size of the grid disk:

CellCLI> list griddisk where name like ‘DBFS_.*’ attributes name, cellDisk, size
DBFS_CD_02_lbcel01_dr_adm CD_02_lbcel01_dr_adm 33.796875G
DBFS_CD_03_lbcel01_dr_adm CD_03_lbcel01_dr_adm 33.796875G
DBFS_CD_04_lbcel01_dr_adm CD_04_lbcel01_dr_adm 33.796875G
DBFS_CD_05_lbcel01_dr_adm CD_05_lbcel01_dr_adm 33.796875G
DBFS_CD_06_lbcel01_dr_adm CD_06_lbcel01_dr_adm 33.796875G
DBFS_CD_07_lbcel01_dr_adm CD_07_lbcel01_dr_adm 33.796875G
DBFS_CD_08_lbcel01_dr_adm CD_08_lbcel01_dr_adm 33.796875G
DBFS_CD_09_lbcel01_dr_adm CD_09_lbcel01_dr_adm 33.796875G
DBFS_CD_10_lbcel01_dr_adm CD_10_lbcel01_dr_adm 33.796875G
DBFS_CD_11_lbcel01_dr_adm CD_11_lbcel01_dr_adm 33.796875G

 

#Drop

CellCLI> drop griddisk DBFS_CD_02_lbcel01_dr_adm
drop griddisk DBFS_CD_03_lbcel01_dr_adm
drop griddisk DBFS_CD_04_lbcel01_dr_adm
drop griddisk DBFS_CD_05_lbcel01_dr_adm
drop griddisk DBFS_CD_06_lbcel01_dr_adm
drop griddisk DBFS_CD_07_lbcel01_dr_adm
drop griddisk DBFS_CD_08_lbcel01_dr_adm
drop griddisk DBFS_CD_09_lbcel01_dr_adm
drop griddisk DBFS_CD_10_lbcel01_dr_adm
drop griddisk DBFS_CD_11_lbcel01_dr_adm

#Create

cellcli> create griddisk DBFS_CD_02_lbcel01_dr_adm celldisk=CD_02_lbcel01_dr_adm, size=33.796875G
create griddisk DBFS_CD_03_lbcel01_dr_adm celldisk=CD_03_lbcel01_dr_adm, size=33.796875G
create griddisk DBFS_CD_04_lbcel01_dr_adm celldisk=CD_04_lbcel01_dr_adm, size=33.796875G
create griddisk DBFS_CD_05_lbcel01_dr_adm celldisk=CD_05_lbcel01_dr_adm, size=33.796875G
create griddisk DBFS_CD_06_lbcel01_dr_adm celldisk=CD_06_lbcel01_dr_adm, size=33.796875G
create griddisk DBFS_CD_07_lbcel01_dr_adm celldisk=CD_07_lbcel01_dr_adm, size=33.796875G
create griddisk DBFS_CD_08_lbcel01_dr_adm celldisk=CD_08_lbcel01_dr_adm, size=33.796875G
create griddisk DBFS_CD_09_lbcel01_dr_adm celldisk=CD_09_lbcel01_dr_adm, size=33.796875G
create griddisk DBFS_CD_10_lbcel01_dr_adm celldisk=CD_10_lbcel01_dr_adm, size=33.796875G
create griddisk DBFS_CD_11_lbcel01_dr_adm celldisk=CD_11_lbcel01_dr_adm, size=33.796875G

Do the same steps on other cells.

2.  Deconfigure root.sh on each node

# Run deconfig

/u01/app/12.1.0.2/grid/crs/install/rootcrs.sh -deconfig -force

#rename gpnp profile

mv /u01/app/12.1.0.2/grid/gpnp/profiles/peer/profile.xml /tmp/profile_backup.xml

3. Run root.sh on first node

/u01/app/12.1.0.2/grid/root.sh

It will fail because will not find the disk group DBFS for mounting and of course OCR inside.  But now asm is started in nomount mode and we are able to recreate diskgroup

4. Create DBFS diskgroup

sqlplus / as sysasm

SQL> create diskgroup DBFS
failgroup LBCEL01_DR_ADM disk ‘o/*/DBFS_CD_02_lbcel01_dr_adm’,’o/*/DBFS_CD_03_lbcel01_dr_adm’,’o/*/DBFS_CD_04_lbcel01_dr_adm’,’o/*/DBFS_CD_05_lbcel01_dr_adm’,’o/*/DBFS_CD_06_lbcel01_dr_adm’,’o/*/DBFS_CD_07_lbcel01_dr_adm’,’o/*/DBFS_CD_08_lbcel01_dr_adm’,’o/*/DBFS_CD_09_lbcel01_dr_adm’,’o/*/DBFS_CD_10_lbcel01_dr_adm’,’o/*/DBFS_CD_11_lbcel01_dr_adm’
failgroup LBCEL02_DR_ADM disk ‘o/*/DBFS_CD_02_lbcel02_dr_adm’,’o/*/DBFS_CD_03_lbcel02_dr_adm’,’o/*/DBFS_CD_04_lbcel02_dr_adm’,’o/*/DBFS_CD_05_lbcel02_dr_adm’,’o/*/DBFS_CD_06_lbcel02_dr_adm’,’o/*/DBFS_CD_07_lbcel02_dr_adm’,’o/*/DBFS_CD_08_lbcel02_dr_adm’,’o/*/DBFS_CD_09_lbcel02_dr_adm’,’o/*/DBFS_CD_10_lbcel02_dr_adm’,’o/*/DBFS_CD_11_lbcel02_dr_adm’
failgroup lbcel03_dr_adm disk ‘o/*/DBFS_CD_02_lbcel03_dr_adm’,’o/*/DBFS_CD_03_lbcel03_dr_adm’,’o/*/DBFS_CD_04_lbcel03_dr_adm’,’o/*/DBFS_CD_05_lbcel03_dr_adm’,’o/*/DBFS_CD_06_lbcel03_dr_adm’,’o/*/DBFS_CD_07_lbcel03_dr_adm’,’o/*/DBFS_CD_08_lbcel03_dr_adm’,’o/*/DBFS_CD_09_lbcel03_dr_adm’,’o/*/DBFS_CD_10_lbcel03_dr_adm’,’o/*/DBFS_CD_11_lbcel03_dr_adm’
ATTRIBUTE
‘compatible.asm’=’12.1.0.2.0’,
‘compatible.rdbms’=’11.2.0.2.0’,
‘au_size’=’4194304’,
‘cell.smart_scan_capable’=’TRUE’;

5. Do the following steps:

* Deconfigure root.sh again from first node
* remove gpnp profile
* run root.sh again on first node

At this time root.sh should be successful.

6. Restore OCR

/u01/app/12.1.0.2/grid/cdata/<clustername> directory contans OCR backups by default

crsctl stop crs -f
crsctl start crs -excl -nocrs
ocrconfig -restore /u01/app/12.1.0.2/grid/cdata/lbank-clus-dr/backup00.ocr
crsctl stop crs -f
crsctl start crs

7. Run root.sh on the second node

/u01/app/12.1.0.2/grid/root.sh

Create RAC database using DBCA silent mode

Real World Scenario: 

Previously, we had a vacancy on Senior DBA position. Some of our candidates had >15 years of experience in database administration.

So for testing their knowlege we created lab. There were already installed grid and database softwares, shared disks were present and diskgroups were already created.

The first task was to create RAC database in silent mode using DBCA.  They had an option to use the internet during the exam. But unfortunatelly they have not managed to do that.

So I decided to write the simple version of the script:

dbca -silent \
-createDatabase \
-templateName General_Purpose.dbc \
-gdbName orcl  \
-sid orcl  \
-SysPassword MyPassword123 \
-SystemPassword MyPassword123 \
-emConfiguration NONE \
-redoLogFileSize 2048  \
-recoveryAreaDestination FRA \
-storageType ASM \
-asmSysPassword MyPassword123 \
-diskGroupName DATA \
-characterSet AL32UTF8 \
-nationalCharacterSet AL32UTF8 \
-automaticMemoryManagement true \
-totalMemory 2536  \
-databaseType MULTIPURPOSE \
-nodelist rac1,rac2

Copying database files
1% complete
3% complete
9% complete
15% complete
21% complete
30% complete
Creating and starting Oracle instance
32% complete
36% complete
40% complete
44% complete
45% complete
48% complete
50% complete
Creating cluster database views
52% complete
70% complete
Completing Database Creation
73% complete
76% complete
85% complete
94% complete
100% complete
Look at the log file “/u01/app/oracle/cfgtoollogs/dbca/orcl/orcl.log” for further details.

dbca deleteDatabase removes listener alias from tnsnames.ora

I have two databases ORCL, MYDB. Each of them has LOCAL_LISTENER  set to listener alias NODEFQDN that is described in tnsnames.ora.

[oracle@rac1 ~]$ cat /u01/app/oracle/product/12.2.0/dbhome_1/network/admin/tnsnames.ora

NODEFQDN =
(ADDRESS = (PROTOCOL = TCP)(Host = rac1.mydomain.com)(Port = 1522))

ORCL =
(DESCRIPTION =
(ADDRESS = (PROTOCOL = TCP)(HOST = maritest-scan.mydomain.com)(PORT = 1521))
(CONNECT_DATA =
(SERVER = DEDICATED)
(SERVICE_NAME = orcl)))

MYDB =
(DESCRIPTION =
(ADDRESS = (PROTOCOL = TCP)(HOST = maritest-scan.mydomain.com)(PORT = 1521))
(CONNECT_DATA =
(SERVER = DEDICATED)
(SERVICE_NAME = mydb)))

[oracle@rac1 ~]$ sqlplus mari@ORCL
Connected to:
Oracle Database 12c Enterprise Edition Release 12.2.0.1.0 – 64bit Production

SQL> show parameter local_listener

NAME TYPE VALUE
———————————— ———– ——————————
local_listener string NODEFQDN

[oracle@rac1 ~]$ sqlplus mari@MYDB
Connected to:
Oracle Database 12c Enterprise Edition Release 12.2.0.1.0 – 64bit Production

SQL> show parameter local_listener

NAME TYPE VALUE
———————————— ———– ——————————
local_listener string NODEFQDN

I have deleted ORCL database using dbca :

[oracle@rac1 ~]$ dbca -silent -deleteDatabase -sourceDB orcl
Enter SYS user password:

Connecting to database 9% complete 14% complete 19% complete 23% complete 28% complete 33% complete 38% complete 47% complete
Updating network configuration files 48% complete 52% complete
Deleting instances and datafiles 66% complete 80% complete 95% complete 100% complete
Look at the log file “/u01/app/oracle/cfgtoollogs/dbca/orcl.log” for further details.

Checked tnsnames.ora and see that NODEFQDN alias is deleted:

[oracle@rac1 ~]$ cat /u01/app/oracle/product/12.2.0/dbhome_1/network/admin/tnsnames.ora

MYDB =
(DESCRIPTION =
(ADDRESS = (PROTOCOL = TCP)(HOST = maritest-scan.mydomain.com)(PORT = 1521))
(CONNECT_DATA =
(SERVER = DEDICATED)
(SERVICE_NAME = mydb)))

Problem is that MYDB still has LOCAL_LISTENER=NODEFQDN. It means when I restart MYDB database it will not automatically be registered with the listener because tns does not contain NODEFQDN anymore.

Identified that, deletion of this entry depends on LOCAL_LISTENER parameter. If it is set to this alias then during db deletion that entry  is also deleted(unfortunatelly, dbca does not consider if that entry is used by other dbs) . If the parameter is empty or has the value :  (ADDRESS = (PROTOCOL = TCP)(Host = rac1.mydomain.com)(Port = 1522)) then entry stays in tnsnames.ora after db deletion.

To prevent dbca delete that entry from tnsnames.ora even LOCAL_LISTENER is set to NODEFQDN. There exist one trick:

In tnsnames.ora single entry can have multiple aliases, this is not docummented but seems we have a lot of hidden features:

Example:

alias1,alias2, alias3 =
(DESCRIPTION=
(ADDRESS=(PROTOCOL=tcp)(HOST=hostname)(PORT=1521))
(CONNECT_DATA=
(SERVICE_NAME=service_name)))

You can have blank spaces between aliases or not have, just you must separate them by commas.

So in our case we can write like this:

DONOTDELETE,NODEFQDN =
(ADDRESS = (PROTOCOL = TCP)(Host = rac1.mydomain.com)(Port = 1522))

It is up to you what will be the first alias 🙂 You may write DBCADOTTOUCH ))

Each alias is resolvable:

[oracle@rac1 ~]$ tnsping NODEFQDN
..
Used TNSNAMES adapter to resolve the alias
Attempting to contact (ADDRESS = (PROTOCOL = TCP)(Host = rac1.mydomain.com)(Port = 1522))
OK (0 msec)

[oracle@rac1 ~]$ tnsping DONOTDELETE
..
Used TNSNAMES adapter to resolve the alias
Attempting to contact (ADDRESS = (PROTOCOL = TCP)(Host = rac1.mydomain.com)(Port = 1522))
OK (0 msec)

After database deletion this entry stays in tnsnames.ora

Good Luck!

Script to capture and restore file permissions

Backing up file permissions is the best practice. Even extra permissions on files can mess up installed software.

Editing this post:

Thanks to zhwsh about this comment, that even does not need to be explained:

“getfacl -R /u01/app/11.2.0.4/grid > dir_privs.txt
setfact –restore dir_privs.txt”

In any case leaving perl script that does the same as getfacl. 

Usage:

chmod 755 backup_permissions.pl

./backup_permissions.pl <Path>

Script: 

#!/usr/bin/perl -w
#
# Captures file permissions and the owner of the files
# useage : perm1.pl <path to capture permission>
#

use strict;
use warnings;
use File::Find;
use POSIX();

my (@dir) = @ARGV;
my $linecount=0 ;

#print @ARGV, $#ARGV;

if ($#ARGV < 0) {
print “\n\nOpps….Invalid Syntax !!!!\n” ;
print “Usage : ./perm1.pl <path to capture permission>\n\n” ;
print “Example : ./perm1.pl /home/oralce\n\n” ;
exit ;
}
my $logdir=$dir[0] ;
#my ($sec, $min, $hr, $day, $mon, $year) = localtime;
##my ($dow,$mon,$date,$hr,$min,$sec,$year) = POSIX::strftime( ‘%a %b %d %H %M %S %Y’, localtime);
my $date = POSIX::strftime( ‘%a-%b-%d-%H-%M-%S-%Y’, localtime);
my $logfile=”permission-“.$date;
my $cmdfile=”restore-perm-“.$date.”.cmd” ;

open LOGFILE, “> $logfile” or die $! ;
open CMDFILE, “> $cmdfile” or die $! ;
find(\&process_file,@dir);

print “Following log files are generated\n” ;
print “logfile : “.$logfile. “\n” ;
print “Command file : “.$cmdfile. “\n” ;
print “Linecount : “.$linecount.”\n” ;
close (LOGFILE) ;
close (CMDFILE) ;

sub process_file {
my ($dev,$ino,$mode,$nlink,$uid,$gid,$rdev,$size, $atime,$mtime,$ctime,$blksize,$blocks,$username,$user,$pass,$comment,$home,$shell,$group);
my %uiduname = () ;
my %gidgname = () ;
my $filename = $File::Find::name;

#### Building uid, username hash

open (PASSWDFILE, ‘/etc/passwd’) ;

while ( <PASSWDFILE>) {
($user,$pass,$uid,$gid,$comment,$home,$shell)=split (/:/) ;
$uiduname{$uid}=$user ;
}
close (PASSWDFILE) ;

#### Building gid, groupname hash

open (GRPFILE, ‘/etc/group’) ;

while ( <GRPFILE>) {
($group,$pass,$gid)=split (/:/) ;
$gidgname{$gid}=$group ;
}
close (GRPFILE) ;

($dev,$ino,$mode,$nlink,$uid,$gid,$rdev,$size, $atime,$mtime,$ctime,$blksize,$blocks) = stat(“$filename”);
# printf “%o %s %s %s\n”, $mode & 07777, $uiduname{$uid}, $gidgname{$gid}, $filename ;
printf LOGFILE “%o %s %s %s\n”, $mode & 07777, $uiduname{$uid}, $gidgname{$gid}, $filename ;
printf CMDFILE “%s %s%s%s %s\n”, “chown “,$uiduname{$uid}, “:”, $gidgname{$gid}, $filename ;
printf CMDFILE “%s %o %s\n”, “chmod “,$mode & 07777, $filename ;
# printf “%o %s %s %s\n”, $mode & 07777, $uiduname{$uid}, $gidgname{$gid}, $filename ;
$linecount++ ;
}

Note:

The above script generates restore-perm-<timestamp>.cmd file.

When you want to restore permissions make this file executable and run:

chmod 755 restore-perm-<timestamp>.cmd

./restore-perm-<timestamp>.cmd

 

 

 

Restart Exadata storage cell service without affecting ASM

Brief history:

One week ago on our DR Exadata cell service hanged, which caused all databases located on Exadata to become inaccessible.

CellCLI> LIST ALERTHISTORY
9 2017-10-13T11:56:05+04:00 critical “RS-7445 [Serv CELLSRV hang detected] [It will be restarted] [] [] [] [] [] [] [] [] [] []”

In cell’s alert history there was written that the service would be restarted itself , but it did not and I restarted it by the following way:

CellCLI> ALTER CELL RESTART SERVICES CELLSRV

The databases started to work correctly.

Today, the same problem happend on the HQ side which of course caused to stop everything for a while until I’ve restarted the service.

But identifying which cell was problematic was a little bit difficult, because there was no error in alerthistory.

BUT when I entered the following command on the third cell node – it hanged, other cells were OK.

CellCLI> LIST ACTIVEREQUEST

So I restarted the same service on that node and problem was resolved.

CellCLI> ALTER CELL RESTART SERVICES CELLSRV

Of course, this is not a solution and cell service must not hang! , but this is the simple workaround when you have stopped PRODUCTION database.

I have created SR and waiting answer from them , if there is any usefull news will update this post.

===================================================================================================

Writing down the correct steps of restarting Cell Services without affecting ASM:

1.  Run the following command to check if there are offline disks on other cells that are mirrored with disks on this cell:

CellCLI > LIST GRIDDISK ATTRIBUTES name WHERE asmdeactivationoutcome != ‘Yes’

Warning : If any grid disks are listed in the returned output, then it is not safe to stop or re-start the CELLSRV process because proper Oracle ASM disk group redundancy will not be intact and will cause Oracle ASM to dismount the affected disk group, causing the databases to shut down abruptly.

If no grid disks are listed in the returned output, you can safely restart cellsrv or all services in step #2 below.

2.  Re-start the cell services using either of the following commands:

CellCLI> ALTER CELL RESTART SERVICES CELLSRV

CellCLI> ALTER CELL RESTART SERVICES ALL

BUT what is good news cell has self-defence on reduced redundancy, if you try to restart it when redundancy check is not satisfied you get:

CellCLI> ALTER CELL RESTART SERVICES ALL;

Stopping the RS, CELLSRV, and MS services…
The SHUTDOWN of ALL services was not successful.
CELL-01548: Unable to shut down CELLSRV because disk group DATA, RECO may be forced to dismount due to reduced redundancy.
Getting the state of CELLSRV services… running
Getting the state of MS services… running
Getting the state of RS services… running