Upgrade 12.1.0.2 to 12.2.0.1 fails at 109 phase with “Unexpected error encountered”

We were trying to upgrade our database one week ago and it failed with the following error:

Serial Phase #:108 [ORCL] Files:1 Time: 1s 
******************* Migration ****************** 
Serial Phase #:109 [ORCL] Files:1 wait_for_completion: unexpected error in next_proc() 
catconExec: unexpected error in wait_for_completions 

Unexpected error encountered in catconExec; exiting 

Unexpected error encountered in catctlMain; Error Stack Below; exiting 
Died at /u01/app/oracle/product/12.2.0/db/rdbms/admin/catctl.pl line 7822. 
at /u01/app/oracle/product/12.2.0/db/rdbms/admin/catctl.pl line 7822. 
main::catctlDie("\x{a}Unexpected error encountered in catconExec; exiting\x{a} 2") called at /u01/app/oracle/product/12.2.0/db/rdbms/admin/catctl.pl line 4556 
main::catctlExecutePhaseFiles(109, 1, undef, undef, undef) called at /u01/app/oracle/product/12.2.0/db/rdbms/admin/catctl.pl line 1862 
main::catctlRunPhase(109, 1, undef, undef, undef) called at /u01/app/oracle/product/12.2.0/db/rdbms/admin/catctl.pl line 2006 
main::catctlRunPhases(0, 116, 116, undef, undef, undef) called at /u01/app/oracle/product/12.2.0/db/rdbms/admin/catctl.pl line 2171 
main::catctlRunMainPhases() called at /u01/app/oracle/product/12.2.0/db/rdbms/admin/catctl.pl line 1341 
main::catctlMain() called at /u01/app/oracle/product/12.2.0/db/rdbms/admin/catctl.pl line 1256 
eval {...} called at /u01/app/oracle/product/12.2.0/db/rdbms/admin/catctl.pl line 1254

We asked Oracle Support. It seemed they did not have such situation before.
12.2 is the newest version and any existing bugs are not known at this time.

We postponed upgrade for the next week after upgrade failure. And I started to read all the documentation steps carefully. I almost learned by heart 😀

Documentation id where the upgrade steps are written is  2173141.1

I will write down upgrade steps and the error that happened at the phase 109.
Then let’s correct an error and finish upgrade successfully.

================ Preupgrade steps

0. Copy orapwORCL, sqlnet.ora, tnsnames.ora, listener.ora files from old home to new home.

  1. Execute Preupgrade script from source home
    export ORACLE_HOME=/u01/app/oracle/product/12.1.0/db
    export PATH=/usr/lib64/qt-3.3/bin:/usr/local/bin:/bin:/usr/bin:/usr/local/sbin:/usr/sbin:/sbin:/home/oracle/bin:/u01/app/oracle/product/12.1.0/db/bin
    $ORACLE_HOME/jdk/bin/java -jar /u01/app/oracle/product/12.2.0/db/rdbms/admin/preupgrade.jar TERMINAL TEXT

    It will generate two scripts pre/post fixups.

  2. Run preupgrade fixup , it will not correct all problems. Some steps must be done manually.
    sqlplus /  as sysdba
    @?/rdbms/admin/preupgrade_fixups.sql
  3. In documentation there is written to backup whole database, but our database is 11TB in size, so backing up before upgrade is not an option.
    But if you read doc carefully , there is written that you may make schema tablespaces offline(read only) which means that it is not necessary to backup schema tablesapces, because they are not changed during upgrade.
    So we can simply shutdown cleanly (normal, immediate, transactional)  our database and cold copy system, sysaux datafiles to another safe location. You must also copy one of the controlfile.
    Copying undo tablespace datafiles is not necessary, you are still able to recover database when it is cleanly shutdown. So if you backup undo , you may be saved with extra steps that is necessary to recover database without undo.Take user tablespaces read only. So output of this select:

    SELECT distinct 'alter tablespace '|| tablespace_name||' read only;' 
    from dba_data_files
    where tablespace_name not in ('SYSTEM','SYSAUX','UNDOTBS01','UNDOTBS02');

Please indicate your UNDO tablesapce name instead of mine.

4. Disable any custom DDL trigger , if exists. I have had created DDL trigger, so disabling it.

           alter trigger sys.audit_ddl_trg disable;

5. Purge recyclebin and gather dictionary statistics

           purge dba_recyclebin;
           exec dbms_stats.gather_dictionary_stats;

6. Shutdown database with clean option! or you will not be able to restore after upgrade fails.

           sqlplus / as sysdba
           alter database checkpoint;
           shutdown immediate;
           lsnrctl stop

7.  cold copy SYSTEM, SYSAUX, control files

            cp /ud01/oradata/ORCL/sysaux01.dbf /ud02/backup_mk/
            cp /ud01/oradata/ORCL/system01.dbf /ud02/backup_mk/
            cp /ud01/oradata/ORCL/control01.ctl /ud02/backup_mk/
            cp /ud01/oradata/ORCL/control02.ctl /ud02/backup_mk/

======================Upgrade

8.  Start DB in upgrade mode from target ORACLE_HOME

           export ORACLE_HOME=/u01/app/oracle/product/12.2.0/db
           export PATH=/usr/lib64/qt-3.3/bin:/usr/local/bin:/bin:/usr/bin:/usr/local/sbin:/usr/sbin:/sbin:/home/oracle/bin:/u01/app/oracle/product/12.2.0/db/bin:/u01/app/oracle/product/12.2.0/db

           sqlplus / as sysdba
           startup upgrade;
           exit

           cd /u01/app/oracle/product/12.2.0/db/bin
           ./dbupgrade -n 33 -T
           

-T option tells upgrade that schema tablespaces need to be offline(we already done it, just indicate -T)

So, here at phase 109 it fails with unexpected error , mentioned previously.

You must see all the logs that the upgrade creates.. the main error was written in catupgrd0.log

              catrequtlmg: b_StatEvt = TRUE 
              catrequtlmg: b_SelProps = FALSE 
              catrequtlmg: b_UpgradeMode = TRUE 
              catrequtlmg: b_InUtlMig = TRUE 
              catrequtlmg: Deleting table stats 
              catrequtlmg: Gathering Table Stats OBJ$MIG 
              catrequtlmg: Gathering Table Stats USER$MIG 
              declare 
              * 
              ERROR at line 1: 
              ORA-01422: exact fetch returns more than requested number of rows 
              ORA-06512: at "SYS.DBMS_STATS", line 36873 
              ORA-06512: at "SYS.DBMS_STATS", line 36507 
              ORA-06512: at "SYS.DBMS_STATS", line 35428 
              ORA-06512: at "SYS.DBMS_STATS", line 34760 
              ORA-06512: at "SYS.DBMS_STATS_INTERNAL", line 22496 
              ORA-06512: at "SYS.DBMS_STATS_INTERNAL", line 22483 
              ORA-06512: at "SYS.DBMS_STATS", line 34416 
              ORA-06512: at "SYS.DBMS_STATS", line 35168 
              ORA-06512: at "SYS.DBMS_STATS", line 36230 
              ORA-06512: at "SYS.DBMS_STATS", line 36716 
              ORA-06512: at line 149

As you see it was gathering statistics on USER$MIG table and it got ORA-01422 🙂
If you try to gather statistics manually you will also get the same error:

         EXEC sys.dbms_stats.gather_table_stats('SYS', 'USER$MIG',method_opt=>'FOR ALL COLUMNS SIZE SKEWONLY');

If you check entries in obj$ table, you will see that there are duplicate entries:

          SQL> select to_char(OBJ#), name,  ctime, STATUS 
               from obj$ 
               where name='USER$MIG'; 

          TO_CHAR(OBJ#) NAME CTIME 
          ----------------- ------------ -------------------- 
          956329 USER$MIG 08-10-2015 12:59:55 
          956328 USER$MIG 08-10-2015 12:59:55 
          956325 USER$MIG 08-10-2015 12:59:53 
          956327 USER$MIG 08-10-2015 12:59:55 
          956326 USER$MIG 08-10-2015 12:59:55 
          1621545 USER$MIG 19-05-2017 22:55:26

There are six entries. Five of them is created in 2015 (It maybe left from previous upgrade in 2015 that also failed for the first time)

Oracle support did not advice to delete old entries from this table, BUT support sometimes is                       wrong. We had no time , we were upgrading database 4 hours , for the first time it was failed, we                 had backup so we risked and deleted 2015 entries from this table.

          delete from obj$ 
          where OBJ# in (956329, 
          956328, 
          956325, 
          956327, 
          956326); 
          commit;

Rerun upgrade from phase 109

          ./dbupgrade -p 109 -T -n 40

And it completed successfully (happy) , so our risk was right !!!!

################Post Upgrade

           SQL> @?/rdbms/admin/postupgrade_fixups.sql

It will fix up some steps that can be done automatically but  it may also write steps that must be                   done manually.  One of them is upgrading timestamp, which I have described in my previous                       post. “Upgrading timezone manually in 12c”

Change compatible variable in your spfile

           From 
           *.compatible='12.1.0.2.0'
           To
           *.compatible='12.2.0.1.0'

Change home to new oracle home in /etc/oratab and in listener.ora.

Recompile invalid objects

           SQL> @?/rdbms/admin/utlrp.sql
           select count(*) from dba_objects where status!='VALID';

Start listener from new home:

           export ORACLE_HOME=/u01/app/oracle/product/12.2.0/db/bin
           export PATH=/usr/lib64/qt-3.3/bin:/usr/local/bin:/bin:/usr/bin:/usr/local/sbin:/usr/sbin:/sbin:/home/oracle/bin:/u01/app/oracle/product/12.2.0/db/bin
           lsnrctl start

Enable trigger if you have disabled before

           alter trigger sys.audit_ddl_trg enable;

Make schema tablespaces read write:

           SELECT distinct 'alter tablespace '|| tablespace_name||' read write;' 
           from dba_data_files
           where tablespace_name not in ('SYSTEM','SYSAUX','UNDOTBS01','UNDOTBS02');
Advertisements

Upgrading timezone manually in 12c

After manual upgrade from 12.1.0.2 to 12.2.0.1 oracle pre/post upgrade scripts required to upgrade timezone manually.

Our database was using timezone datafile version 18 and the target 12.2.0.1.0 database ships with timezone datafile version 26.

Updating timezone is somehow complicated process.

I will provide you with the steps that we have used:

The whole steps are described on metalink note : Doc ID 1509653.1

  1. Check the current version

SQL> select TZ_VERSION from registry$database;

TZ_VERSION
———-
18

  1. Check aslo
SQL> SELECT PROPERTY_NAME, SUBSTR(property_value, 1, 30) value
 FROM DATABASE_PROPERTIES
 WHERE PROPERTY_NAME LIKE 'DST_%'
 ORDER BY PROPERTY_NAME;

PROPERTY_NAME                VALUE
 --------------------------- ----------

DST_PRIMARY_TT_VERSION      18
DST_SECONDARY_TT_VERSION    0
DST_UPGRADE_STATE           NONE

If DST_PRIMARY_TT_VERSION is <the old DST version number>, DST_SECONDARY_TT_VERSION  is 0 and  DST_UPGRADE_STATE is NONE then continue, otherwise you need to see Note 1509653.1

  1.  Purge recyclebin
sqlplus / as sysdba
purge dba_recyclebin;

–this alter session might speed up DBMS_DST on some db’s
— see Bug 10209691 / Bug 12658443

alter session set "_with_subquery"=materialize;

— to avoid the issue in note 1407273.1

alter session set "_simple_view_merging"=TRUE;

— start prepare window , these steps will NOT update any data yet.

exec DBMS_DST.BEGIN_PREPARE(26);

— truncate logging tables if they exist.

TRUNCATE TABLE SYS.DST$TRIGGER_TABLE;
TRUNCATE TABLE sys.dst$affected_tables;
TRUNCATE TABLE sys.dst$error_table;

SQL> select * from sys.dst$affected_tables;
no rows selected

SQL> select * from sys.dst$error_table;
no rows selected

–If there is no error then end the prepare

EXEC DBMS_DST.END_PREPARE;

5. If error_table is empty we can run the actual timezone upgrade

sqlplus / as sysdba
shutdown immediate;
startup upgrade;

purge dba_recyclebin;
TRUNCATE TABLE SYS.DST$TRIGGER_TABLE;
TRUNCATE TABLE sys.dst$affected_tables;
TRUNCATE TABLE sys.dst$error_table;
EXEC DBMS_APPLICATION_INFO.SET_CLIENT_INFO('upg_tzv')

alter session set "_with_subquery"=materialize;
alter session set "_simple_view_merging"=TRUE;

exec DBMS_DST.BEGIN_PREPARE(26);

SQL> SELECT * FROM sys.dst$error_table;
no rows selected

SQL> SELECT PROPERTY_NAME, SUBSTR(property_value, 1, 30) value
FROM DATABASE_PROPERTIES
WHERE PROPERTY_NAME LIKE 'DST_%'
ORDER BY PROPERTY_NAME;

PROPERTY_NAME              VALUE
-------------------------  ------------------------------------
DST_PRIMARY_TT_VERSION     26
DST_SECONDARY_TT_VERSION   18
DST_UPGRADE_STATE          UPGRADE

— some oracle provided users may be listed here, that is normal

SQL> SELECT OWNER, TABLE_NAME, UPGRADE_IN_PROGRESS
FROM ALL_TSTZ_TABLES
where UPGRADE_IN_PROGRESS='YES';

shutdown immediate
startup

alter session set "_with_subquery"=materialize;
alter session set "_simple_view_merging"=TRUE;

— now upgrade the tables who need action

VAR numfail number
BEGIN
DBMS_DST.UPGRADE_DATABASE(:numfail,
parallel => TRUE,
log_errors => TRUE,
log_errors_table => 'SYS.DST$ERROR_TABLE',
log_triggers_table => 'SYS.DST$TRIGGER_TABLE',
error_on_overlap_time => FALSE,
error_on_nonexisting_time => FALSE);
DBMS_OUTPUT.PUT_LINE('Failures:'|| :numfail);
END;
/

 — this select should return now rows

SELECT * FROM sys.dst$error_table;

 — if there where no failures then end the upgrade.

VAR fail number
BEGIN
DBMS_DST.END_UPGRADE(:fail);
DBMS_OUTPUT.PUT_LINE('Failures:'|| :fail);
END;
/

— Check

SQL> SELECT * FROM v$timezone_file;

FILENAME                VERSION     CON_ID
-------------------- ---------- ----------
timezlrg_26.dat              26          0

SQL> SELECT PROPERTY_NAME, SUBSTR(property_value, 1, 30) value
FROM DATABASE_PROPERTIES
WHERE PROPERTY_NAME LIKE 'DST_%'
ORDER BY PROPERTY_NAME;

PROPERTY_NAME             VALUE
-----------------         ----------
DST_PRIMARY_TT_VERSION    26
DST_SECONDARY_TT_VERSION  18
DST_UPGRADE_STATE         NONE

–Check the following:

SELECT VERSION FROM v$timezone_file;

VERSION 
-------------
26
select TZ_VERSION from registry$database;

TZ_VERSION  
---------
18

–if they differ after an upgrade then updating registry$database can be done by

conn / as sysdba
update registry$database 
set TZ_VERSION = (select version 
                  FROM   v$timezone_file);
commit;

 

 

Send Oracle Audit to rsyslog

In our database there is turned on auditing on some operations and audit records go to OS.

SYS> show parameter audit_file_dest

NAME                TYPE        VALUE
------------------ ----------- ------------------------------
audit_file_dest  string       /u01_log/audit/orcl

SYS > show parameter audit_trail

NAME        TYPE         VALUE
------------- ----------- -----------
audit_trail string        OS

Our security administrators are using SIEM to monitor suspicious activities and they want database to send audit records to this third party tool.

I thought that I could somehow indicate directory “/u01_log/audit/orcl” from where *.aud files would be uploaded to SIEM, but I was wrong. Some tools may be able to use these *.aud files but not SIEM and let’s configure our database to be able to send audit records to it.

1. Connect to a database instance as sysdba user

SQL> connect / as sysdba

2. Set audit trail to OS

SQL> alter system set audit_trail=OS;

3. Enable auditing for system users if you need to audit activities of sys user(optional)

SQL> alter system set audit_sys_operations=TRUE;

4. Set rsyslog facility and severity(needs database restart)

SQL> alter system set audit_syslog_level=local5.info scope=spfile sid='*';

5.  Restart database

SQL> shutdown immediate;
SQL> startup;

6. Edit rsyslog.conf file

#Saving oracle database audit records
local5.info          /u01_log/audit/RSYSLOG/dbaudit.log
#Send oracle database audit trail to remote rsyslog server
local5.info          @192.168.0.15

7. Restart rsyslog service

# service rsyslog restart
Shutting down system logger: [ OK ]
Starting system logger: [ OK ]

8. It is better to limit the size for audit log, or it may fill the space:

# vi /etc/logrotate.d/oracle.audit

#Created by MariK

/u01_log/audit/RSYSLOG/dbaudit.log {
 rotate 3
 compress
 missingok
 notifempty
 size 40G
 postrotate
 service rsyslog restart
 endscript
}

To check the syntax run :

# logrotate /etc/logrotate.d/oracle.audit

It will say if you have an error. If syntax is ok then output is nothing.

Configure resource manager to kill sessions automatically after maximum idle time is passed

Problem:

Our applications are opening too many connections and moreover are not closing them at all 🙂 .  Because of this to many sessions stay idle and after IDLE_TIME is passed they become SNIPED.
As you know SNIPED session still holds session counter and it is completely cleaned out just after SNIPED session tries to execute something(it of course errors out). But if SNIPED session never tries to execute anything then the session stays forever in database.  And after a while database throws ORA-00018 maximum number of sessions exceeded.

My old solution: 

Created script file /u01/app/oracle/dba_scripts/kill_sniped.sh, with content:

#!/bin/ksh

#Written by MK

cd /u01/app/oracle/dba_scripts
export ORACLE_BASE=/u01/app/oracle
export ORACLE_HOME=/u01/app/oracle/product/12.1.0.2/dbhome_1
export ORACLE_SID=orcl1
export ORACLE_USER=oracle
$ORACLE_HOME/bin/sqlplus -s / as sysdba <<EOF

SET SERVEROUTPUT ON SIZE 1000000;
CALL DBMS_JAVA.SET_OUTPUT(1000000);

DECLARE
snum NUMBER;
BEGIN
FOR i IN (SELECT ‘alter system kill session ”’||a.SID||’,’||a.serial#||’,@’||inst_id||”’ immediate’ killSniped FROM gv\$session a
WHERE (a.status=’SNIPED’ or a.status=’KILLED’)
and a.username is not null
)
LOOP
begin
execute immediate i.killSniped;
exception when others then null;
end;
END LOOP;
END;
/
EOF

You will easily guess what does it do. It finds sessions with status SNIPED and KILLED and executes alter system kill session script for them.

Created crontab entry:

$ crontab -l
*/10 * * * * /u01/app/oracle/dba_scripts/kill_sniped.sh > /u01/app/oracle/dba_scripts/logs/kill_sniped.log 2>&1

Script was working fine about one year, without any problem 🙂 but yesterday my script was not able to handle all of these sessions and it was killing slower than SNIPED sessions were appearing in our database so database raised ORA-00018 error.

New and better solution:

Created consumer group , set plan directive with MAX_IDLE_TIME 900sec for this group and moved problematic app user in this group.

After MAX_IDLE_TIME is passed user session is automatically killed by resource manager and it is the quickest.

BEGIN
DBMS_RESOURCE_MANAGER.CREATE_PENDING_AREA();

DBMS_RESOURCE_MANAGER.CREATE_PLAN(PLAN => ‘RESTRICTIVE_PLAN’, COMMENT => ”);

DBMS_RESOURCE_MANAGER.CREATE_CONSUMER_GROUP(CONSUMER_GROUP => ‘RSGROUP’, COMMENT =>”);

DBMS_RESOURCE_MANAGER.CREATE_PLAN_DIRECTIVE(PLAN => ‘RESTRICTIVE_PLAN’
, GROUP_OR_SUBPLAN => ‘RSGROUP’
, COMMENT => ”
, MAX_IDLE_TIME => 900);

DBMS_RESOURCE_MANAGER.SUBMIT_PENDING_AREA();
DBMS_RESOURCE_MANAGER.CREATE_PENDING_AREA();

DBMS_RESOURCE_MANAGER_PRIVS.GRANT_SWITCH_CONSUMER_GROUP(GRANTEE_NAME => ‘RSAPP’
, CONSUMER_GROUP => ‘RSGROUP’
, GRANT_OPTION => FALSE);

DBMS_RESOURCE_MANAGER.SET_INITIAL_CONSUMER_GROUP( ‘RSAPP’, ‘RSGROUP’);

DBMS_RESOURCE_MANAGER.SWITCH_CONSUMER_GROUP_FOR_USER( ‘RSAPP’, ‘RSGROUP’);

DBMS_RESOURCE_MANAGER.SUBMIT_PENDING_AREA();
end;

ALTER SYSTEM SET RESOURCE_MANAGER_PLAN=’RESTRICTIVE_PLAN’;

Note: RSAPP user had IDLE_TIME 15min in its profile, that is why I have set MAX_IDLE_TIME to 900sec(15min). Be careful for this decision , you should set this value appropriate to profile IDLE_TIME value. Or first discuss it with developers, they may not want you to kill their app session after 15min.. but after 20min.

To check how many sessions were killed by resource manager check:

SELECT IDLE_SESSIONS_KILLED
FROM V$RSRC_CONSUMER_GROUP
WHERE NAME=’RSGROUP’;

Hope post was useful. 🙂


Best Practices for Configuring Redo Transport for Data Guard and Active Data Guard 12c

I have three standby databases db01 located in HQ, db02 located in DR, db03 located in DR and it should be late standby with delay 15 days.

My task is to configure the following standby architecture:

When db01 is primary it should send logs in SYNC mode to db02 and at the same time db02 should send logs in ASYNC mode to db03.

When db02 is in primary role it should send logs in SYNC mode to db01 and at the same time db01 should send logs in ASYNC mode to db03.

So db01 and db02 database should be in sync mode with real-time apply and db03 should be late standby with delay 15 days and it should receive logs from standby database in ASYNC mode.

I have underlined the above sentence, because for now this cannot be achieved with cascading standby. Read bellow…

I have found very useful documentation so here is the link: http://www.oracle.com/technetwork/database/availability/broker-12c-transport-config-2082184.pdf

It introduces data broker new feature that is available in 12c. Property RedoRoutes.

So in my broker configuration I will set RedoRoutes property by the following way:

DGMGRL> show configuration

Configuration – DB_HQ_DR

Protection Mode: MaxPerformance
Members:
db01 – Primary database
db02 – Physical standby database
db03- Physical standby database

Fast-Start Failover: DISABLED

Configuration Status:
SUCCESS

DGMGRL> DGMGRL> edit database ‘db01′ set property RedoRoutes='(LOCAL:db02)(db02:db03)’;
Property “redoroutes” updated
DGMGRL> edit database ‘db02′ set property RedoRoutes='(LOCAL:db01)(db01:db03)’;
Property “redoroutes” updated

Normally, delayed apply can be configured by property DelayMins:

DGMGRL> edit database ‘db03′ set property DelayMins=’21600’;
Property “delaymins” updated

21600 is 15 days.

BUT, I must tell you a bad news:  according to this article https://docs.oracle.com/database/121/SBYDB/log_arch_dest_param.htm#SBYDB01105

“The DELAY value that a cascaded standby uses is the value that was set for the LOG_ARCHIVE_DEST_n parameter on the primary that shipped the redo to the cascading standby.”

So I cannot have the following architecture:

db01 —-real_time_apply—-db02—-delayed_apply—–db03

because db03 will take delay parameter from db02 that is no delay.

Async/Sync mode can be configured by property LogXptMode:

DGMGRL> edit database ‘db01′ set property LogXptMode=’SYNC’;
DGMGRL> edit database ‘db02′ set property LogXptMode=’SYNC’;
DGMGRL> edit database ‘db03′ set property LogXptMode=’ASYNC’;

If I want to achieve my goal I should not use cascading standby but primary must be the sender for db02(with DelayMins=0) and db03(with DelayMins=21600)

I hope it helps.

RAC: root.sh | CRS-2672: Attempting to start ‘ora.storage’ | ORA-01017: invalid username/password

I was configuring clusterware on node1 and got the following error:

CRS-2672: Attempting to start ‘ora.storage’ on ‘node1’
ORA-01017: invalid username/password; logon denied
CRS-5017: The resource action “ora.storage start” encountered the following error:
Storage agent start action aborted. For details refer to “(:CLSN00107:)” in “/u01/app/oracle/diag/crs/node1/crs/trace/ohasd_orarootagent_root.trc”.
CRS-2883: Resource ‘ora.storage’ failed during Clusterware stack start.
CRS-4406: Oracle High Availability Services synchronous start failed.
CRS-4000: Command Start failed, or completed with errors.
2016/09/27 05:41:01 CLSRSC-117: Failed to start Oracle Clusterware stack

Died at /u01/app/12.1.0.2/grid/crs/install/crsinstall.pm line 930.
The command ‘/u01/app/12.1.0.2/grid/perl/bin/perl -I/u01/app/12.1.0.2/grid/perl/lib -I/u01/app/12.1.0.2/grid/crs/install /u01/app/12.1.0.2/grid/crs/install/rootcrs.pl ‘ exe ution failed

 

/u01/app/oracle/diag/crs/node1/crs/trace/ohasd_orarootagent_root.trc file says:

2016-09-27 05:40:56.787330*:kgfn.c@6018: kgfnConnect2Int: sysasm=0 envflags=0x10 srvrflags=0x3 unam=NULL password is NULL pstr=_ocr
2016-09-27 05:40:56.787330*:kgfn.c@6194: kgfnConnect2Int: cstr=(DESCRIPTION=(ADDRESS=(PROTOCOL=beq)(PROGRAM=/u01/app/12.1.0.2/grid/bin/oracle)(ARGV0=oracle+ASM1_ocr)(ENVS=’ORACLE_HOME=/u01/app/12.1.0.2/grid,ORACLE_SID=+ASM1′)(ARGS='(DESCRIPTION=(LOCAL=YES)(ADDRESS=(PROTOCOL=beq)))’)(PRIVS=(USER=root)(GROUP=root)))(enable=setuser))
2016-09-27 05:40:57.273302 : AGENT:2583111424: {0:9:3} {0:9:3} Created alert : (:CRSAGF00113:) : Aborting the command: start for resource: ora.storage 1 1

 

So why user root???

See, when I connect using root I got ORA-01017

[root@node1 ~]# . oraenv
ORACLE_SID = [+ASM1] ? +ASM1
The Oracle base has been set to /u01/app/oracle
[root@node1 ~]# sqlplus / as sysasm

SQL*Plus: Release 12.1.0.2.0 Production on Tue Sep 27 05:59:01 2016
Copyright (c) 1982, 2014, Oracle. All rights reserved.

ERROR:
ORA-01017: invalid username/password; logon denied

If I connect through Oracle it is OK:

su – oracle

[oracle@node1 ~]$ . oraenv
ORACLE_SID = [LBTCI1] ? +ASM1

[oracle@node1 ~]$ sqlplus / as sysdba

SQL*Plus: Release 12.1.0.2.0 Production on Tue Sep 27 05:59:45 2016
Copyright (c) 1982, 2014, Oracle. All rights reserved.
Connected to:
Oracle Database 12c Enterprise Edition Release 12.1.0.2.0 – 64bit Production
With the Real Application Clusters and Automatic Storage Management options

SQL>

 

Look the connection string again there is “PROGRAM=/u01/app/12.1.0.2/grid/bin/oracle”, so let’s check file permissions.

[oracle@node1 ~]$ ll /u01/app/12.1.0.2/grid/bin/oracle
-rwsr-s–x 1 root root 295054213 Sep 27 05:26 /u01/app/12.1.0.2/grid/bin/oracle

It must be oracle:oinstall  not root:root 

chown oracle:oinstall /u01/app/12.1.0.2/grid/bin/oracle
chmod 6751 /u01/app/12.1.0.2/grid/bin/oracle

 

deconfigure(rootcrs.pl -deconfig  -verbose) crs and reconfigure(run root.sh) it again.

 

error: package cvuqdisk is not installed

I have applied patch on RAC and after running postinstall script on the first node, it failed because of some file permission and the problem started…

I could not startup clusterware on first node.

I have deconfigured clusterware by:

[root@lbdm01-dr-adm ~]# $ORACLE_HOME/crs/install/rootcrs.pl -deconfig -force -verbose

And here I got error(cvuqdisk):

PRCR-1070 : Failed to check if resource ora.net1.network is registered
CRS-0184 : Cannot communicate with the CRS daemon.
PRCR-1070 : Failed to check if resource ora.helper is registered
CRS-0184 : Cannot communicate with the CRS daemon.
PRCR-1070 : Failed to check if resource ora.ons is registered
CRS-0184 : Cannot communicate with the CRS daemon.

CRS-2791: Starting shutdown of Oracle High Availability Services-managed resources on ‘lbdm01-dr-adm’
CRS-2679: Attempting to clean ‘ora.cssd’ on ‘lbdm01-dr-adm’
CRS-2680: Clean of ‘ora.cssd’ on ‘lbdm01-dr-adm’ failed
CRS-2799: Failed to shut down resource ‘ora.cssd’ on ‘lbdm01-dr-adm’
CRS-2795: Shutdown of Oracle High Availability Services-managed resources on ‘lbdm01-dr-adm’ has failed
CRS-4687: Shutdown command has completed with errors.
CRS-4000: Command Stop failed, or completed with errors.
2016/09/26 19:54:12 CLSRSC-463: The deconfiguration or downgrade script could not stop current Oracle Clusterware stack.

2016/09/26 19:54:12 CLSRSC-4006: Removing Oracle Trace File Analyzer (TFA) Collector.

2016/09/26 19:54:26 CLSRSC-4007: Successfully removed Oracle Trace File Analyzer (TFA) Collector.

error: package cvuqdisk is not installed
2016/09/26 19:54:26 CLSRSC-557: Oracle Clusterware stack on this node has been successfully deconfigured. There were some errors which can be ignored.

 

I have searched information about this package on documentation and I have found the following:

Link: https://docs.oracle.com/database/121/LADBI/pre_install.htm#LADBI7632

Installing the cvuqdisk RPM for Linux

If you do not use an Oracle Preinstallation RPM, then you must install the cvuqdisk RPM. Without cvuqdisk, the Cluster Verification Utility cannot find shared disks, and you receive a “Package cvuqdisk not installed” error when you run the Cluster Verification Utility. Use the cvuqdisk RPM for your hardware (for example, x86_64, or i386).

To install the cvuqdisk RPM, complete the following procedure:

  1. Locate the cvuqdisk RPM package, which is in the directory rpm on the Oracle Database installation media. If you installed Oracle Grid Infrastructure, then it is in the directory oracle_home1/cv/rpm.
  2. Log in as root.
  3. Use the following command to find if you have an existing version of the cvuqdisk package:
  4.  
    # rpm -qi cvuqdisk
    

    If you have an existing version, then enter the following command to deinstall the existing version:

    # rpm -e cvuqdisk
    
  5. Set the environment variable CVUQDISK_GRP to point to the group that owns cvuqdisk, typically oinstall, for example:
    # CVUQDISK_GRP=oinstall; export CVUQDISK_GRP
    
  6. In the directory where you have saved the cvuqdisk RPM, use the following command to install the cvuqdisk package:
    rpm -iv package
    

    For example:

    # rpm -iv cvuqdisk-1.0.9-1.rpm

So I have found the mentioned package in the following directory and installed it.

cd /u01/app/12.1.0.2/grid/cv/rpm

yum install cvuqdisk-1.0.9-1.rpm

Note: The problem is strange but, I am not writing why this happened in this post, because I don’t know it yet 🙂
The aim of this post is that you should know where to find cvuqdisk package and what is it for 🙂

Good Luck!