数据库版本:Oracle 11.2.0.4
数据库启动后其中一个节点Instance terminated by PMON, pid = 44253导致数据库实例无法启动
告警日志:
Fri Mar 25 17:41:08 2022
RCBG started with pid=51, OS id=44654
replication_dependency_tracking turned off (no async multimaster replication found)
Starting background process QMNC
Fri Mar 25 17:41:09 2022
QMNC started with pid=52, OS id=44658
Errors in file /u01/app/oracle/diag/rdbms/syorcl/syorcl1/trace/syorcl1_ora_44429.trc:
ORA-00604: error occurred at recursive SQL level 1
ORA-23322: Privilege error accessing pipe
ORA-06512: at "HZMCASSET.TSECTOOLS", line 58
ORA-06512: at line 6
Completed: ALTER DATABASE OPEN /* db agent *//* {0:41:3} */
Thread 1 advanced to log sequence 58320 (LGWR switch)
Current log# 10 seq# 58320 mem# 0: +DATA/syorcl/onlinelog/redo10.log
Fri Mar 25 17:41:13 2022
Archived Log entry 6955 added for thread 1 sequence 58319 ID 0xbcf06b10 dest 1:
******************************************************************
LGWR: Setting 'active' archival for destination LOG_ARCHIVE_DEST_4
******************************************************************
LNS: Standby redo logfile selected for thread 1 sequence 58320 for destination LOG_ARCHIVE_DEST_4
Fri Mar 25 17:41:13 2022
ARC3: Standby redo logfile selected for thread 1 sequence 58319 for destination LOG_ARCHIVE_DEST_4
Fri Mar 25 17:41:21 2022
minact-scn: Inst 1 is a slave inc#:20734 mmon proc-id:44333 status:0x2
minact-scn status: grec-scn:0x0000.00000000 gmin-scn:0x0000.00000000 gcalc-scn:0x0000.00000000
Fri Mar 25 17:41:21 2022
Starting background process CJQ0
Fri Mar 25 17:41:21 2022
CJQ0 started with pid=85, OS id=44768
Fri Mar 25 17:41:22 2022
Dumping diagnostic data in directory=[cdmp_20220325174135], requested by (instance=2, osid=25931 (LMS3)), summary=[incident=576121].
Fri Mar 25 17:41:23 2022
Reconfiguration started (old inc 20734, new inc 20736)
List of instances:
1 (myinst: 1)
Global Resource Directory frozen
* dead instance detected - domain 0 invalid = TRUE
Communication channels reestablished
Master broadcasted resource hash value bitmaps
Non-local Process blocks cleaned out
Fri Mar 25 17:41:23 2022
Fri Mar 25 17:41:23 2022
LMS 0: 0 GCS shadows cancelled, 0 closed, 0 Xw survived
Fri Mar 25 17:41:23 2022
LMS 1: 0 GCS shadows cancelled, 0 closed, 0 Xw survived
LMS 2: 0 GCS shadows cancelled, 0 closed, 0 Xw survived
Fri Mar 25 17:41:23 2022
LMS 3: 3 GCS shadows cancelled, 0 closed, 0 Xw survived
Set master node info
Submitted all remote-enqueue requests
Dwn-cvts replayed, VALBLKs dubious
All grantable enqueues granted
Post SMON to start 1st pass IR
Submitted all GCS remote-cache requests
Post SMON to start 1st pass IR
Fix write in gcs resources
Reconfiguration complete
Fri Mar 25 17:41:24 2022
Transaction recovery: lock conflict caught and ignored
Instance recovery: looking for dead threads
Beginning instance recovery of 1 threads
minact-scn: Inst 1 is now the master inc#:20736 mmon proc-id:44333 status:0x7
minact-scn status: grec-scn:0x0000.00000000 gmin-scn:0x0000.00000000 gcalc-scn:0x0000.00000000
minact-scn: master found reconf/inst-rec before recscn scan old-inc#:20736 new-inc#:20736
parallel recovery started with 32 processes
Started redo scan
Completed redo scan
read 38484 KB redo, 87720 data blocks need recovery
Started redo application at
Thread 2: logseq 58421, block 87
Recovery of Online Redo Log: Thread 2 Group 17 Seq 58421 Reading mem 0
Mem# 0: +DATA/syorcl/onlinelog/redo16.log
Recovery of Online Redo Log: Thread 2 Group 18 Seq 58422 Reading mem 0
Mem# 0: +DATA/syorcl/onlinelog/redo17.log
Recovery of Online Redo Log: Thread 2 Group 19 Seq 58423 Reading mem 0
Mem# 0: +DATA/syorcl/onlinelog/redo18.log
Recovery of Online Redo Log: Thread 2 Group 20 Seq 58424 Reading mem 0
Mem# 0: +DATA/syorcl/onlinelog/redo19.log
Recovery of Online Redo Log: Thread 2 Group 45 Seq 58425 Reading mem 0
Mem# 0: +DATA/syorcl/onlinelog/redo20.log
Completed redo application of 16.70MB
Completed instance recovery at
Thread 2: logseq 58425, block 4783, scn 21114773079313
87691 data blocks read, 87725 data blocks written, 38484 redo k-bytes read
Thread 2 advanced to log sequence 58426 (thread recovery)
Redo thread 2 internally disabled at seq 58426 (SMON)
Fri Mar 25 17:41:29 2022
ALTER SYSTEM SET service_names='SYS$SYS.SCHEDULER$_EVENT_QUEUE.SYORCL' SCOPE=MEMORY SID='syorcl1';
Fri Mar 25 17:41:29 2022
Setting recovery pair for thread 2: nab 4783 seq 58425
Fri Mar 25 17:41:29 2022
Archived Log entry 6959 added for thread 2 sequence 58425 ID 0xbcf06b10 dest 1:
ARC3: Standby redo logfile selected for thread 2 sequence 58425 for destination LOG_ARCHIVE_DEST_4
Fri Mar 25 17:41:30 2022
ARC0: Archiving disabled thread 2 sequence 58426
Archived Log entry 6961 added for thread 2 sequence 58426 ID 0xbcf06b10 dest 1:
Transaction recovery: lock conflict caught and ignored
Transaction recovery: lock conflict caught and ignored
Transaction recovery: lock conflict caught and ignored
Transaction recovery: lock conflict caught and ignored
Transaction recovery: lock conflict caught and ignored
Transaction recovery: lock conflict caught and ignored
Transaction recovery: lock conflict caught and ignored
Transaction recovery: lock conflict caught and ignored
Transaction recovery: lock conflict caught and ignored
Transaction recovery: lock conflict caught and ignored
Transaction recovery: lock conflict caught and ignored
Transaction recovery: lock conflict caught and ignored
Transaction recovery: lock conflict caught and ignored
Transaction recovery: lock conflict caught and ignored
Transaction recovery: lock conflict caught and ignored
Transaction recovery: lock conflict caught and ignored
Transaction recovery: lock conflict caught and ignored
Transaction recovery: lock conflict caught and ignored
Transaction recovery: lock conflict caught and ignored
Transaction recovery: lock conflict caught and ignored
Transaction recovery: lock conflict caught and ignored
Transaction recovery: lock conflict caught and ignored
Transaction recovery: lock conflict caught and ignored
Transaction recovery: lock conflict caught and ignored
Transaction recovery: lock conflict caught and ignored
Transaction recovery: lock conflict caught and ignored
Transaction recovery: lock conflict caught and ignored
Transaction recovery: lock conflict caught and ignored
Transaction recovery: lock conflict caught and ignored
Transaction recovery: lock conflict caught and ignored
Transaction recovery: lock conflict caught and ignored
Transaction recovery: lock conflict caught and ignored
Transaction recovery: lock conflict caught and ignored
Transaction recovery: lock conflict caught and ignored
Transaction recovery: lock conflict caught and ignored
Transaction recovery: lock conflict caught and ignored
Transaction recovery: lock conflict caught and ignored
Transaction recovery: lock conflict caught and ignored
Transaction recovery: lock conflict caught and ignored
Transaction recovery: lock conflict caught and ignored
Transaction recovery: lock conflict caught and ignored
Transaction recovery: lock conflict caught and ignored
Transaction recovery: lock conflict caught and ignored
Transaction recovery: lock conflict caught and ignored
Transaction recovery: lock conflict caught and ignored
Transaction recovery: lock conflict caught and ignored
Transaction recovery: lock conflict caught and ignored
Transaction recovery: lock conflict caught and ignored
Transaction recovery: lock conflict caught and ignored
Transaction recovery: lock conflict caught and ignored
Transaction recovery: lock conflict caught and ignored
Transaction recovery: lock conflict caught and ignored
Transaction recovery: lock conflict caught and ignored
Transaction recovery: lock conflict caught and ignored
Transaction recovery: lock conflict caught and ignored
Transaction recovery: lock conflict caught and ignored
Transaction recovery: lock conflict caught and ignored
Transaction recovery: lock conflict caught and ignored
Transaction recovery: lock conflict caught and ignored
Transaction recovery: lock conflict caught and ignored
Transaction recovery: lock conflict caught and ignored
Fri Mar 25 17:41:31 2022
Thread 1 advanced to log sequence 58321 (LGWR switch)
Current log# 11 seq# 58321 mem# 0: +DATA/syorcl/onlinelog/redo11.log
minact-scn: master continuing after IR
Archived Log entry 6963 added for thread 1 sequence 58320 ID 0xbcf06b10 dest 1:
Fri Mar 25 17:41:31 2022
LNS: Standby redo logfile selected for thread 1 sequence 58321 for destination LOG_ARCHIVE_DEST_4
ARC3: Archive log rejected (thread 2 sequence 58426) at host '(DESCRIPTION = (ADDRESS_LIST = (ADDRESS = (PROTOCOL= TCP)(HOST = 192.200.54.41)(PORT = 1555))) (CONNECT_DATA =(SERVICE_NAME = syorcl)))'
FAL[server, ARC3]: FAL archive failed, see trace file.
ARCH: FAL archive failed. Archiver continuing
ORACLE Instance syorcl1 - Archival Error. Archiver continuing.
Fri Mar 25 17:42:24 2022
Decreasing number of real time LMS from 4 to 0
Fri Mar 25 17:45:31 2022
Starting background process SMCO
Fri Mar 25 17:45:31 2022
SMCO started with pid=58, OS id=46943
Fri Mar 25 17:47:26 2022
Reconfiguration started (old inc 20736, new inc 20738)
List of instances:
1 2 (myinst: 1)
Global Resource Directory frozen
Communication channels reestablished
Master broadcasted resource hash value bitmaps
Non-local Process blocks cleaned out
Fri Mar 25 17:47:27 2022
LMS 2: 0 GCS shadows cancelled, 0 closed, 0 Xw survived
Fri Mar 25 17:47:27 2022
LMS 1: 0 GCS shadows cancelled, 0 closed, 0 Xw survived
Fri Mar 25 17:47:27 2022
LMS 3: 0 GCS shadows cancelled, 0 closed, 0 Xw survived
Fri Mar 25 17:47:27 2022
LMS 0: 0 GCS shadows cancelled, 0 closed, 0 Xw survived
Set master node info
Submitted all remote-enqueue requests
Dwn-cvts replayed, VALBLKs dubious
All grantable enqueues granted
Submitted all GCS remote-cache requests
Fix write in gcs resources
Reconfiguration started (old inc 20738, new inc 20740)
List of instances:
1 2 (myinst: 1)
Nested reconfiguration detected.
Global Resource Directory frozen
Communication channels reestablished
Master broadcasted resource hash value bitmaps
Non-local Process blocks cleaned out
LMS 1: 0 GCS shadows cancelled, 0 closed, 0 Xw survived
LMS 0: 0 GCS shadows cancelled, 0 closed, 0 Xw survived
LMS 3: 0 GCS shadows cancelled, 0 closed, 0 Xw survived
LMS 2: 0 GCS shadows cancelled, 0 closed, 0 Xw survived
Set master node info
Submitted all remote-enqueue requests
Dwn-cvts replayed, VALBLKs dubious
All grantable enqueues granted
Fri Mar 25 17:47:28 2022
minact-scn: Master returning as live inst:2 has inc# mismatch instinc:0 cur:20740 errcnt:0
Submitted all GCS remote-cache requests
Fix write in gcs resources
Reconfiguration complete
Fri Mar 25 17:47:46 2022
Transaction recovery: lock conflict caught and ignored
Transaction recovery: lock conflict caught and ignored
Transaction recovery: lock conflict caught and ignored
Transaction recovery: lock conflict caught and ignored
Transaction recovery: lock conflict caught and ignored
Fri Mar 25 17:47:46 2022
Thread 1 advanced to log sequence 58322 (LGWR switch)
Current log# 12 seq# 58322 mem# 0: +DATA/syorcl/onlinelog/redo12.log
Fri Mar 25 17:47:46 2022
LNS: Standby redo logfile selected for thread 1 sequence 58322 for destination LOG_ARCHIVE_DEST_4
Fri Mar 25 17:47:46 2022
Archived Log entry 6970 added for thread 1 sequence 58321 ID 0xbcf06b10 dest 1:
Fri Mar 25 17:48:00 2022
Exception [type: SIGIOT, unknown code] [ADDR:0x1F60000AD07] [PC:0x39E5432495, gsignal()+53] [exception issued by pid: 44295, uid: 502] [flags: 0x0, count: 1]
Errors in file /u01/app/oracle/diag/rdbms/syorcl/syorcl1/trace/syorcl1_lms3_44295.trc (incident=928121):
ORA-07445: exception encountered: core dump [gsignal()+53] [SIGIOT] [ADDR:0x1F60000AD07] [PC:0x39E5432495] [unknown code] []
Incident details in: /u01/app/oracle/diag/rdbms/syorcl/syorcl1/incident/incdir_928121/syorcl1_lms3_44295_i928121.trc
Use ADRCI or Support Workbench to package the incident.
See Note 411.1 at My Oracle Support for error and packaging details.
Fri Mar 25 17:48:00 2022
Dumping diagnostic data in directory=[cdmp_20220325174800], requested by (instance=1, osid=44295 (LMS3)), summary=[incident=928121].
Fri Mar 25 17:48:01 2022
Sweep [inc][928121]: completed
Sweep [inc2][928121]: completed
System state dump requested by (instance=1, osid=44253 (PMON)), summary=[abnormal instance termination].
System State dumped to trace file /u01/app/oracle/diag/rdbms/syorcl/syorcl1/trace/syorcl1_diag_44271_20220325174802.trc
Fri Mar 25 17:48:03 2022
PMON (ospid: 44253): terminating the instance due to error 484
Fri Mar 25 17:48:03 2022
ORA-1092 : opitsk aborting process
Dumping diagnostic data in directory=[cdmp_20220325174802], requested by (instance=1, osid=44253 (PMON)), summary=[abnormal instance termination].
Instance terminated by PMON, pid = 44253
查看trace日志:
------------------- start error stack dump with barriers
<error barrier> at 0x7f9777f61980
ORA-07445: exception encountered: core dump [gsignal()+53] [SIGIOT] [ADDR:0x1F60000AD07] [PC:0x39E5432495] [unknown code] []
------------------- end error stack dump with barriers
----- END DDE Action: 'dumpKGEState' (SUCCESS, 0 csec) -----
----- START DDE Action: 'kpuActionDefault' (Sync) -----
Begin OCI Current State Dump
End OCI Current State Dump
Begin OCI Call Context Dump
End OCI Call Context Dump
Begin Process state dump.
ttcdrvdmplocation: msg-0 ln-0 reporting 0
HST is NULL or no two task connection
End Process state dump.
----- END DDE Action: 'kpuActionDefault' (SUCCESS, 0 csec) -----
----- END DDE Actions Dump (total 1 csec) -----
End of Incident Dump
在trace日志中,关注报错信息
ORA-07445: exception encountered: core dump [gsignal()+53] [SIGIOT] [ADDR:0x1F60000AD07] [PC:0x39E5432495] [unknown code] []
根据MOS文档:RAC Databases on Exadata Terminate with: ORA-07445: Exception Encountered: Core Dump [gsignal()+53] [SIGIOT] (Doc ID 2492508.1)中提到,该问题在12c开始修复。
报错信息类似如下:
Exception [type: SIGIOT, unknown code] [ADDR:0x138800033A66] [PC:0x7F24019864F5, gsignal()+53] [exception issued by pid: 211558, uid: 5000] [flags: 0x0, count: 1]
Errors in file $TRACE/p66800271_ipc0_211558.trc (incident=136051) (PDBNAME=CDB$ROOT):
ORA-07445: exception encountered: core dump [gsignal()+53] [SIGIOT] [ADDR:0x138800033A66] [PC:0x7F24019864F5] [unknown code] []
2018-09-27T01:12:18.547927-04:00
Instance Critical Process (pid: 6, ospid: 211558, IPC0) died unexpectedly
PMON (ospid: 211459): terminating the instance due to error 499
Instance terminated by PMON, pid = 211459
问题原因:
This issue was caused by files being removed from the /var/tmp/.oracle while the databases were running.
The directory /var/tmp/.oracle contains a number of "special" socket files that are used by local clients to connect via the IPC protocol (sqlnet) to various Oracle processes including the TNS listener, the CSS, CRS & EVM daemons or even database or ASM instances. Files should not be removed from this directory unless the products are not running.
解决办法:
Do not remove files from /var/tmp/.oracle or any of the following depending on your OS with the database and Grid Infrastructure are running:
/usr/tmp/.oracle/* /var/tmp/.oracle/* /tmp/.oracle/* /var/adm/.oracle/*
本文记录了一次Oracle 11.2.0.4数据库实例启动失败的问题排查过程,主要表现为一个节点因进程异常被PMON终止,通过分析错误日志定位到与/var/tmp/.oracle目录文件被误删除有关。
2320

被折叠的 条评论
为什么被折叠?



