一是clusterware层:
所有节点的Clusterware组成一个集群,并构成一 个集群成员列表(Cluster membership list)第个节点会分配一个成员ID(Node Id)这些Clusterware 之间互相通信 以了解彼此的状态, 并从中选出一个节点作为Master Node,Master Node负责管理集群状态的 变迁。当有节点加入或离开集群时,集群的状态会发生变迁 ,最终达到一个新的稳定状态。每个集群的稳定状态用一个数值表示,这个数值叫做Cluster Incarnation Number。达到新稳定状态时,这个数值会改变。 +1吗
RAC 中的各个实例也构成了 一个实例成员列表 (Instance membership list) , 每个实例也使用Clusterware 层的node id作为身份标 识,这个ID在集群生命周期内是不会变的。RAC Instance在启动时会把LMON、DBWR等需要操作共享存储的进程 作为一个组注册 到Clusterware中,并从Clusterware获得node id作为组ID。
RAC集群与节点集群是两个层次的集群,两个集群都 有脑裂、IO隔离等问题。这两个集群都有各自的故障检测机制。如果在RAC这一层检测到节点故障,RAC集群会做如下工作
(1)暂停对外服务
(2)RAC通知Clusterware, 并等待Clusterware完成集群重构 ,达到新的稳态。
(3)Clusterware完成重构后,会通知上层的RAC集群,RAC集群收到这个信息后开始自己的重构。
如下是CSSD的log,主要是Nodeapps类的资源:GSD、ONS、VIP、Listener,当node2加入node1的集群时:
[ CSSD]2011-07-09 16:34:39.332 [118012816] >TRACE: clssgmAddGrockMember: adding member to grock RES ora.node2.vip
[ CSSD]2011-07-09 16:34:39.332 [118012816] >TRACE: clssgmAddMember: granted member(0) flags(0x12) node(1) grock (0x8be24c0/RES ora.node2.vip)
[ CSSD]2011-07-09 16:34:39.332 [118012816] >TRACE: clssgmCommonAddMember: Local member(0) node(1) flags 0x12 0x12 grock (3/0x8be24c0/RES ora.node2.vip)
[ CSSD]2011-07-09 16:34:39.335 [118012816] >TRACE: clssgmExitGrock: client 86 (0x8bf9ea8), grock RES ora.node2.vip, member 0
[ CSSD]2011-07-09 16:34:39.479 [118012816] >TRACE: clssgmAddGrockMember: adding member to grock RES ora.node2.vip
[ CSSD]2011-07-09 16:34:39.479 [118012816] >TRACE: clssgmAddMember: granted member(0) flags(0x12) node(1) grock (0x8be24c0/RES ora.node2.vip)
[ CSSD]2011-07-09 16:34:39.479 [118012816] >TRACE: clssgmCommonAddMember: Local member(0) node(1) flags 0x12 0x12 grock (3/0x8be24c0/RES ora.node2.vip)
[ CSSD]2011-07-09 16:34:39.489 [118012816] >TRACE: clssgmExitGrock: client 87 (0x8bf9ea8), grock RES ora.node2.vip, member 0
[ CSSD]2011-07-09 16:34:39.520 [118012816] >TRACE: clssgmAddGrockMember: adding member to grock RES ora.node2.vip
[ CSSD]2011-07-09 16:34:39.520 [118012816] >TRACE: clssgmAddMember: granted member(0) flags(0x12) node(1) grock (0x8be24c0/RES ora.node2.vip)
[ CSSD]2011-07-09 16:34:39.520 [118012816] >TRACE: clssgmCommonAddMember: Local member(0) node(1) flags 0x12 0x12 grock (3/0x8be24c0/RES ora.node2.vip)
[ CSSD]2011-07-09 16:34:39.651 [118012816] >TRACE: clssgmJoinGrock: grock RES ora.node2.gsd new client 0x8c09a20 with con 0x8c0de58, requested num -1
[ CSSD]2011-07-09 16:34:39.652 [118012816] >TRACE: clssgmCommonAddMember: Local member(0) node(1) flags 0x12 0x12 grock (3/0x8c0adb0/RES ora.node2.gsd)
[ CSSD]2011-07-09 16:34:39.655 [118012816] >TRACE: clssgmExitGrock: client 94 (0x8c09a20), grock RES ora.node2.gsd, member 0
[ CSSD]2011-07-09 16:34:39.655 [118012816] >TRACE: clssgmRemoveMember: grock(RES ora.node2.gsd) member(0/0x8bf7d08) nodeNum(1) flags(0x12) type(3)
[ CSSD]2011-07-09 16:34:39.779 [118012816] >TRACE: clssgmJoinGrock: grock RES ora.node2.gsd new client 0x8c09a20 with con 0x8c0de58, requested num -1
[ CSSD]2011-07-09 16:34:39.779 [118012816] >TRACE: clssgmCommonAddMember: Local member(0) node(1) flags 0x12 0x12 grock (3/0x8c0adb0/RES ora.node2.gsd)
[ CSSD]2011-07-09 16:34:39.799 [118012816] >TRACE: clssgmExitGrock: client 95 (0x8c09a20), grock RES ora.node2.gsd, member 0
[ CSSD]2011-07-09 16:34:39.799 [118012816] >TRACE: clssgmRemoveMember: grock(RES ora.node2.gsd) member(0/0x8bf7d08) nodeNum(1) flags(0x12) type(3)
[ CSSD]2011-07-09 16:34:39.939 [118012816] >TRACE: clssgmJoinGrock: grock RES ora.node2.gsd new client 0x8c09a20 with con 0x8bf7d08, requested num -1
[ CSSD]2011-07-09 16:34:39.939 [118012816] >TRACE: clssgmCommonAddMember: Local member(0) node(1) flags 0x12 0x12 grock (3/0x8c0adb0/RES ora.node2.gsd)
[ CSSD]2011-07-09 16:34:39.942 [118012816] >TRACE: clssgmExitGrock: client 96 (0x8c09a20), grock RES ora.node2.gsd, member 0
[ CSSD]2011-07-09 16:34:39.942 [118012816] >TRACE: clssgmRemoveMember: grock(RES ora.node2.gsd) member(0/0x8c0de58) nodeNum(1) flags(0x12) type(3)
[ CSSD]2011-07-09 16:34:39.947 [118012816] >TRACE: clssgmJoinGrock: grock RES ora.node2.gsd new client 0x8c09a20 with con 0x8c0de58, requested num -1
[ CSSD]2011-07-09 16:34:39.947 [118012816] >TRACE: clssgmCommonAddMember: Local member(0) node(1) flags 0x12 0x12 grock (3/0x8c0adb0/RES ora.node2.gsd)
[ CSSD]2011-07-09 16:34:40.147 [118012816] >TRACE: clssgmJoinGrock: grock RES ora.node2.ons new client 0x8c09a20 with con 0x8c0de58, requested num -1
[ CSSD]2011-07-09 16:34:40.147 [118012816] >TRACE: clssgmAddGrockMember: adding member to grock RES ora.node2.ons
[ CSSD]2011-07-09 16:34:40.147 [118012816] >TRACE: clssgmCommonAddMember: Local member(0) node(1) flags 0x12 0x12 grock (3/0x8c0adb0/RES ora.node2.ons)
[ CSSD]2011-07-09 16:34:40.161 [118012816] >TRACE: clssgmExitGrock: client 100 (0x8c09a20), grock RES ora.node2.ons, member 0
[ CSSD]2011-07-09 16:34:40.572 [118012816] >TRACE: clssgmAddGrockMember: adding member to grock RES ora.node2.ons
[ CSSD]2011-07-09 16:34:40.573 [118012816] >TRACE: clssgmAddMember: granted member(0) flags(0x12) node(1) grock (0x8c0adb0/RES ora.node2.ons)
[ CSSD]2011-07-09 16:34:40.573 [118012816] >TRACE: clssgmCommonAddMember: Local member(0) node(1) flags 0x12 0x12 grock (3/0x8c0adb0/RES ora.node2.ons)
[ CSSD]2011-07-09 16:34:40.654 [118012816] >TRACE: clssgmAddGrockMember: adding member to grock RES ora.node2.ons
[ CSSD]2011-07-09 16:34:40.654 [118012816] >TRACE: clssgmCommonAddMember: Local member(0) node(1) flags 0x12 0x12 grock (3/0x8c0adb0/RES ora.node2.ons)
[ CSSD]2011-07-09 16:34:51.332 [118012816] >TRACE: clssgmAddGrockMember: adding member to grock SRVM.DATABASE.NODEAPPS.node2
[ CSSD]2011-07-09 16:34:51.332 [118012816] >TRACE: clssgmAddMember: granted member(0) flags(0x1) node(1) grock (0x8c17ce8/SRVM.DATABASE.NODEAPPS.node2)
[ CSSD]2011-07-09 16:34:52.464 [118012816] >TRACE: clssgmExitGrock: client 1 (0x8c0b1e8), grock SRVM.DATABASE.NODEAPPS.node2, member 0
[ CSSD]2011-07-09 16:35:08.136 [118012816] >TRACE: clssgmAddGrockMember: adding member to grock RES ora.node2.LISTENER_NODE2.lsnr
[ CSSD]2011-07-09 16:35:08.139 [118012816] >TRACE: clssgmExitGrock: client 105 (0x8c0b1e8), grock RES ora.node2.LISTENER_NODE2.lsnr, member 0
[ CSSD]2011-07-09 16:35:08.280 [118012816] >TRACE: clssgmAddGrockMember: adding member to grock RES ora.node2.LISTENER_NODE2.lsnr
[ CSSD]2011-07-09 16:35:08.290 [118012816] >TRACE: clssgmExitGrock: client 106 (0x8c0b1e8), grock RES ora.node2.LISTENER_NODE2.lsnr, member 0
[ CSSD]2011-07-09 16:35:08.293 [118012816] >TRACE: clssgmJoinGrock: grock RES ora.node2.vip new client 0x8c0b1e8 with con 0x8c0d3e0, requested num -1
[ CSSD]2011-07-09 16:35:08.293 [118012816] >TRACE: clssgmAddGrockMember: adding member to grock RES ora.node2.vip
[ CSSD]2011-07-09 16:35:08.293 [118012816] >TRACE: clssgmCommonAddMember: Local member(0) node(1) flags 0x12 0x12 grock (3/0x8be24c0/RES ora.node2.vip)
[ CSSD]2011-07-09 16:35:08.296 [118012816] >TRACE: clssgmExitGrock: client 107 (0x8c0b1e8), grock RES ora.node2.vip, member 0
[ CSSD]2011-07-09 16:35:08.314 [118012816] >TRACE: clssgmAddGrockMember: adding member to grock RES ora.node2.LISTENER_NODE2.lsnr
[ CSSD]2011-07-09 16:35:08.327 [118012816] >TRACE: clssgmExitGrock: client 108 (0x8c0b1e8), grock RES ora.node2.LISTENER_NODE2.lsnr, member 0
[ CSSD]2011-07-09 16:35:08.346 [118012816] >TRACE: clssgmAddGrockMember: adding member to grock RES ora.node2.LISTENER_NODE2.lsnr
[ CSSD]2011-07-09 16:35:39.404 [118012816] >TRACE: clssgmAddGrockMember: adding member to grock SRVM.DATABASE.NODEAPPS.node2
[ CSSD]2011-07-09 16:36:16.854 [118012816] >TRACE: clssgmAddGrockMember: adding member to grock SRVM.DATABASE.NODEAPPS.node2
[ CSSD]2011-07-09 16:36:52.500 [118012816] >TRACE: clssgmAddGrockMember: adding member to grock SRVM.DATABASE.NODEAPPS.node2
[ CSSD]2011-07-09 16:37:28.327 [118012816] >TRACE: clssgmAddGrockMember: adding member to grock SRVM.DATABASE.NODEAPPS.node2
[ CSSD]2011-07-09 16:38:04.017 [118012816] >TRACE: clssgmCommonAddMember: Local member(0) node(1) flags 0x1 0x1 grock (3/0x8c0af88/SRVM.DATABASE.NODEAPP
S.node2)
[ CSSD]2011-07-09 16:38:04.170 [118012816] >TRACE: clssgmExitGrock: client 1 (0x8be0e98), grock SRVM.DATABASE.NODEAPPS.node2, member 0
[ CSSD]2011-07-09 16:38:04.170 [118012816] >TRACE: clssgmRemoveMember: grock(SRVM.DATABASE.NODEAPPS.node2) member(0/0x8be0bf8) nodeNum(1) flags(0x1) typ
e(3)
[ CSSD]2011-07-09 16:38:39.274 [118012816] >TRACE: clssgmJoinGrock: grock SRVM.DATABASE.NODEAPPS.node2 new client 0x8be0bf8 with con 0x8c15920, requeste
d num -1
[ CSSD]2011-07-09 16:38:39.274 [118012816] >TRACE: clssgmAddGrockMember: adding member to grock SRVM.DATABASE.NODEAPPS.node2
[ CSSD]2011-07-09 16:38:39.274 [118012816] >TRACE: clssgmAddMember: granted member(0) flags(0x1) node(1) grock (0x8be25b8/SRVM.DATABASE.NODEAPPS.node2)
[ CSSD]2011-07-09 16:38:39.274 [118012816] >TRACE: clssgmQueueGrockEvent: lockName(SRVM.DATABASE.NODEAPPS.node2) type(2) count (1/1) xwaiters(0) event(1
) to memberNo(0)
[ CSSD]2011-07-09 16:38:39.274 [118012816] >TRACE: clssgmCommonAddMember: Local member(0) node(1) flags 0x1 0x1 grock (3/0x8be25b8/SRVM.DATABASE.NODEAPP
S.node2)
[ CSSD]2011-07-09 16:38:39.341 [118012816] >TRACE: clssgmExitGrock: client 1 (0x8be0bf8), grock SRVM.DATABASE.NODEAPPS.node2, member 0
[ CSSD]2011-07-09 16:38:39.341 [118012816] >TRACE: clssgmRemoveMember: grock(SRVM.DATABASE.NODEAPPS.node2) member(0/0x8bf99d0) nodeNum(1) flags(0x1) typ
e(3)
[ CSSD]2011-07-09 16:38:52.263 [131775376] >TRACE: clssnmConnComplete: properties node node2, number 2, 3,5,6,7,10,13
[ CSSD]2011-07-09 16:38:52.263 [131775376] >TRACE: clssnmConnComplete: node 2, node2, con(0x8bf6d90), probcon((nil)), ninfcon((nil)), node unique 131020
0722, prev unique 0, msg unique 1310200722 node state 0
[ CSSD]2011-07-09 16:38:53.569 [131775376] >TRACE: clssnmHandleJoin: node node2, number 2 JOINING, state 0->1, ninfendp 0x8bf6d90
[ CSSD]2011-07-09 16:38:54.437 [131775376] >TRACE: clssnmUpdateNodeData: node 2 (node2) data length 63 data (ADDRESS=(PROTOCOL=tcp)(DEV=22)(HOST=10.10.1
7.222)(PORT=48719))
[ CSSD]2011-07-09 16:38:54.441 [131775376] >USER: clssnmHandleUpdate: NODE 2 (node2) IS ACTIVE MEMBER OF CLUSTER
二是RAC层
RAC的集群状态是通过LMON进程提供的,这个进程提供了CGS(Cluster Group Service)和NM(Node Management)两个服务。最 底层的是NM服务,它是RAC集群和Clusterware集群的通信通道,通过它把本节点的资源(Cluster Resource)状态登记到本地的Clusterware,然后由后者提供给其它节点的Clusterware,NM还要从Clusterware获得其它节点的资源状态。
1、NM组
每个RAC 实例都有许多进程在工作,比如DBWR,LGWR,LMON等 ,其中任何一个进程出现故障,这个节点的其它活动进程都应受到限制, 否则有可能破坏共享磁盘上的数据。因此,RAC实例的所有进程都是作为一个组(NM GROUP)注册到Clusterware中的,其中的LMON作为组里的Primary Member注册并获得Member ID,而其它进程作为这个组的Slave Member并以相同的Member ID注册到Clusterware。
整个集群的节点成员信息是通过一个位图Bitmap来维护的。每个节点对应一个位bit,0代表节点DOWN,1代表UP,整个有一个有效/无效标志位。这个位图在整个集群作为一个全局资源被永久记录 ,当有新的节点加入集群时,该节点需要读入这个位图,找到自己对应的位bit,把值从0设为1,并且把位图的无效标识设为1 ,这时虽然位图的内容是正确的 ,但状态是无效的 ,其它 节点会定时读入这个位图,一 旦发现这个位图的状态是无效 ,就会触发集群的重构。达到新的稳定状态后,再把位图状态 置为有效。当集群重构完成后,NM会把这个事件传递给CGS层,CGS负责同步所有节点间的重构。正常实例的启动、关闭过程中,Clusterware、NM都 会获得通知。但如果是 实例异常关闭,Clusterware,NM就会不知道,这时就需 要CGS提供的IMR功能进行感知。然后进行重构。
IMR是由CGS提供的重构机制,用于确认实例之间的 连通性、快速地排除故障节点以减少到数据的损害。这个过程中,每个实例都要作出投票 ,投票的内容就是它所认为的整个集群现在状态,然后由一个实例根据这些投票,重新规划出一个新的集群(最大的sub group) 并把这个投票结果(voting result)记录到控制文件,其它节点读取这个结果,确认自己是否属于集群,如果不属于,就要自动退出。如果属于,则参与执行重构过程。投票过程中,所有的成员节点都尝试获得控制文件中的一个字段(CFVRR,Control File Vote Result Record)进行更新,但只会有一个成员获得,这个成员会记录其它成员的投票内容。
比如 一个3节点的RAC,如果实例3的LMON异常,这时CFVRR记录如下:
seq# inst# bitmap
2 0 110
2 1 110
2 2 001 正常三个1
这时 实例3无法获得其它两个节点的状态,最终重构的结果就是实例1、2组成新的集群,节点3被赶出集群。
如果IMR发现出现split-brain,即集群中出现两个group,这时IMR先会通知CM,然后等待CM解决这个脑裂 ,等待时间是_IMR_SPLITBRAIN_RES_WAIT, 缺省600 毫秒 。超时后IMR自己执行节点排除 。 在CGS完成节点的重构之后,GCS、GES才进行数据层面的重构,也就是Crash Recover。
2、重构触发类型
(1)有节点加入或离开集群而触发重构 ,由NM触发。
(2)私网 Network Heartbeat异常:因为LMON或者GCS、GES通信异常 ,由IMR触发。
(3)Controlfile Heartbeat异常:第个实例的CKPT进程 每3 分钟都会更新控件文件的一个数据块 ,叫做Checkpoint Progress Record ,并且是每个实例对应一个 ,因此不会出现 争夺现象。由IMR 触发。
RAC层Cluster Reconfiguration Steps
The cluster reconfiguration process triggers IMR, and a seven-step process ensures complete reconfiguration.
1. Name service is frozen. The CGS contains an internal database of all the members/instances in the cluster with all their configuration and servicing
details. The name service provides a mechanism to address this configuration data in a structured and synchronized manner.
2. Lock database (IDLM) is frozen. The lock database is frozen to prevent processes from obtaining locks on resources that were mastered by the
departing/dead instance.
3. Determination of membership and validation and IMR.
4. Bitmap rebuild takes place, instance name and uniqueness verification. CGS must synchronize the cluster to be sure that all members get the
reconfiguration event and that they all see the same bitmap.
5. Delete all dead instance entries and republish all names newly configured.
6. Unfreeze and release name service for use.
7. Hand over reconfiguration to GES/GCS.
Now that you know when IMR starts and node evictions take place, let's look at the corresponding messages in the alert log and LMON trace files to get a
better picture. (The logs have been edited for brevity. Note all the lines in boldface define the most important steps in IMR and the handoff to other
recovery steps in CGS.)
node1的alert.log(node1 先启动)
Sat Jul 09 16:32:31 CST 2011
starting up 1 shared server(s) ...
Sat Jul 09 16:32:32 CST 2011
lmon registered with NM - instance id 1 (internal mem no 0)
Sat Jul 09 16:32:33 CST 2011
Reconfiguration started (old inc 0, new inc 2)
List of nodes:
0
Global Resource Directory frozen
* allocate domain 0, invalid = TRUE
Communication channels reestablished
Master broadcasted resource hash value bitmaps
Non-local Process blocks cleaned out
Sat Jul 09 16:32:34 CST 2011
LMS 0: 0 GCS shadows cancelled, 0 closed
Set master node info
Submitted all remote-enqueue requests
Dwn-cvts replayed, VALBLKs dubious
All grantable enqueues granted
Post SMON to start 1st pass IR
Sat Jul 09 16:32:34 CST 2011
LMS 0: 0 GCS shadows traversed, 0 replayed
Sat Jul 09 16:32:34 CST 2011
Submitted all GCS remote-cache requests
Fix write in gcs resources
Reconfiguration complete
Sat Jul 09 16:32:59 CST 2011
Database mounted in Shared Mode (CLUSTER_DATABASE=TRUE)
Completed: ALTER DATABASE MOUNT
Sat Jul 09 16:33:01 CST 2011
ALTER DATABASE OPEN
This instance was first to open
node1的alert.log(node2启动时)
Sat Jul 09 16:41:25 CST 2011
Reconfiguration started (old inc 0, new inc 4)
List of nodes:
0 1
Global Resource Directory frozen
* allocate domain 0, invalid = TRUE
Communication channels reestablished
* domain 0 valid = 1 according to instance 0
Sat Jul 09 16:41:26 CST 2011
Master broadcasted resource hash value bitmaps
Non-local Process blocks cleaned out
Sat Jul 09 16:41:26 CST 2011
LMS 0: 0 GCS shadows cancelled, 0 closed
Set master node info
Submitted all remote-enqueue requests
Dwn-cvts replayed, VALBLKs dubious
All grantable enqueues granted
Sat Jul 09 16:41:27 CST 2011
LMS 0: 0 GCS shadows traversed, 0 replayed
Sat Jul 09 16:41:27 CST 2011
Submitted all GCS remote-cache requests
Fix write in gcs resources
Reconfiguration complete
node2的alert.log
Sat Jul 09 16:41:28 CST 2011
Reconfiguration started (old inc 2, new inc 4)
List of nodes:
0 1
Global Resource Directory frozen
Communication channels reestablished
Master broadcasted resource hash value bitmaps
Non-local Process blocks cleaned out
Sat Jul 09 16:41:29 CST 2011
LMS 0: 0 GCS shadows cancelled, 0 closed
Set master node info
Submitted all remote-enqueue requests
Dwn-cvts replayed, VALBLKs dubious
All grantable enqueues granted
Sat Jul 09 16:41:30 CST 2011
LMS 0: 5074 GCS shadows traversed, 2242 replayed
Sat Jul 09 16:41:30 CST 2011
Submitted all GCS remote-cache requests
Post SMON to start 1st pass IR
Fix write in gcs resources
Reconfiguration complete
node1的alert.log(node2 被shutdown abort):
Sat Jul 09 17:32:37 CST 2011
Reconfiguration started (old inc 4, new inc 6)
List of nodes:
0
Global Resource Directory frozen
* dead instance detected - domain 0 invalid = TRUE
Communication channels reestablished
Master broadcasted resource hash value bitmaps
Non-local Process blocks cleaned out
Sat Jul 09 17:32:38 CST 2011
LMS 0: 0 GCS shadows cancelled, 0 closed
Set master node info
Submitted all remote-enqueue requests
Dwn-cvts replayed, VALBLKs dubious
All grantable enqueues granted
Post SMON to start 1st pass IR
Sat Jul 09 17:32:39 CST 2011
LMS 0: 5947 GCS shadows traversed, 0 replayed
Sat Jul 09 17:32:39 CST 2011
Submitted all GCS remote-cache requests
Fix write in gcs resources
Reconfiguration complete
Sat Jul 09 17:32:40 CST 2011
Instance recovery: looking for dead threads
Sat Jul 09 17:32:40 CST 2011
Beginning instance recovery of 1 threads
Sat Jul 09 17:32:42 CST 2011
Started redo scan
Sat Jul 09 17:32:46 CST 2011
Completed redo scan
3 redo blocks read, 5 data blocks need recovery
Sat Jul 09 17:32:46 CST 2011
Started redo application at
Thread 2: logseq 5, block 1884
Sat Jul 09 17:32:47 CST 2011
Recovery of Online Redo Log: Thread 2 Group 3 Seq 5 Reading mem 0
Mem# 0: +RAC_DISK/racdb/onlinelog/group_3.258.751759681
Sat Jul 09 17:32:47 CST 2011
Completed redo application
Sat Jul 09 17:32:47 CST 2011
Completed instance recovery at
Thread 2: logseq 5, block 1887, scn 532837
3 data blocks read, 5 data blocks written, 3 redo blocks read
Sat Jul 09 17:32:48 CST 2011
Thread 2 advanced to log sequence 6 (thread recovery)
这里涉及到一个重要的服务Cluster Group Service(CGS):
LMON:各个实例的LMON进程会定期通信,以检查集群中各节点的健康状态,当某个节点出现故障时, 负责集群 重构。它提供的服务叫Cluster Group Service(CGS),ORACLE Clusterware使用Process Monitor Daemon解决脑裂的方法,如果某节点上的实例异常挂起,如果单从Network、OS、Clusterware几个层面 看,可能检测不到这种异常。因此数据库必须有自我监控的机制。LMON进程提供了节点监控(Node Montor)功能。这个功能是用 来记录应用层各个节点的健康状态,节点的健康状态通过GRD中的一个位图bitmap记录, 两节点咋办 位图不够啊
每个节点一位,0代表关闭,1代表正常运行,各节点的LMON互相通信,确认这个位图的一致性。
LMON可以和下层的Clusterware合作也可以 单独工作。当LMON检测到实例级别的脑裂时,期待借助于Clusterware解决脑裂,但RAC并不假设Clusterware 肯定能解决问题 ,因
此LMON不会无尽等待Clusterware层的处理结果,当等待超时LMON进程会自动触发IMR(Instance Membership Recovery)IMR可以看做是ORACLE在数据库层提供的脑裂、IO隔离机制
。
LMON主要借助两种心跳来完成健康监测:
1、节点间的心跳
2、控制文件的磁盘心跳, 每个实例的CKPT进程 每3秒更新一次控制文件的Checkpoint Progress Record数据块,控制文件是 共享的,因此实例可以互相检测对方是否及时更新以判断状态。
数据库层就是节点心跳和控制文件磁盘心跳
cluster层就是网络心跳 vote disk 心跳 及本地心跳。
LMON 相应的日志:
*** 2011-07-09 16:41:25.412
kjxgmrcfg: Reconfiguration started, reason 1
kjxgmcs: Setting state to 2 0.
*** 2011-07-09 16:41:25.570
Name Service frozen
kjxgmcs: Setting state to 2 1.
kjxgrssvote: reconfig bitmap chksum 0xccd0ae50 cnt 2 master 0 ret 0
kjxggpoll: change poll time to 50 ms
*** 2011-07-09 16:41:25.665
Obtained RR update lock for sequence 3, RR seq 2
*** 2011-07-09 16:41:25.752
Voting results, upd 0, seq 4, bitmap: 0 1
CGS/IMR TIMEOUTS:
CSS recovery timeout = 71 sec
IMR Reconfig timeout = 300 sec
CGS rcfg timeout = 300 sec
kjxgmps: proposing substate 2
kjxgmcs: Setting state to 4 2.
kjfmuin: bitmap 0 1
kjfmmhi: received msg from 0 (inc 2)
kjfmmhi: received msg from 1 (inc 4)
Performed the unique instance identification check
kjxgmps: proposing substate 3
kjxgmcs: Setting state to 4 3.
Name Service recovery started
Deleted all dead-instance name entries
kjxgmps: proposing substate 4
kjxgmcs: Setting state to 4 4.
Multicasted all local name entries for publish
Replayed all pending requests
kjxgmps: proposing substate 5
kjxgmcs: Setting state to 4 5.
Name Service normal
Name Service recovery done
*** 2011-07-09 16:41:27.200
kjxgmps: proposing substate 6
kjxgmcs: Setting state to 4 6.
kjxggpoll: change poll time to 600 ms
*** 2011-07-09 16:41:28.279
kjfcrfg: DRM window size = 128->128 (min lognb = 10)
*** 2011-07-09 16:41:28.279
Reconfiguration started (old inc 2, new inc 4)
Synchronization timeout interval: 900 sec
List of nodes:
0 1
Undo tsn affinity 1
*** 2011-07-09 16:41:28.311
*** 2011-07-09 16:41:28.311
kjfcrfg: query of NESTED_RECONFIGURATION for node 1 failed with 7
Global Resource Directory frozen
node 0
node 1
release 10 2 0 5
asby init, 0/0/x2
asby returns, 0/0/x2/false
* Domain maps before reconfiguration:
* DOMAIN 0 (valid 1): 0
* End of domain mappings
* Domain maps after recomputation:
* DOMAIN 0 (valid 1): 0 1
* End of domain mappings
Dead inst
Join inst 1
Exist inst 0
Active Sendback Threshold = 50 %
Communication channels reestablished
sent syncr inc 4 lvl 1 to 0 (4,5/0/0)
sent synca inc 4 lvl 1 (4,5/0/0)
received all domreplay (4.6)
sent master 0 (4.6)
*** 2011-07-09 16:41:29.535
KJBDOMHVMAP: BEGINS
*** 2011-07-09 16:41:29.560
KJBDOMHVMAP: ENDS
sent dom info (4.6)
sent hv info (4.6)
sent syncr inc 4 lvl 2 to 0 (4,7/0/0)
sent synca inc 4 lvl 2 (4,7/0/0)
Master broadcasted resource hash value bitmaps
* kjfcrfg: domain 0 valid, valid_ver = 4
Non-local Process blocks cleaned out
Set master node info
sent syncr inc 4 lvl 3 to 0 (4,13/0/0)
sent synca inc 4 lvl 3 (4,13/0/0)
Submitted all remote-enqueue requests
kjfcrfg: Number of mesgs sent to node 1 = 774
sent syncr inc 4 lvl 4 to 0 (4,15/0/0)
sent synca inc 4 lvl 4 (4,15/0/0)
Dwn-cvts replayed, VALBLKs dubious
sent syncr inc 4 lvl 5 to 0 (4,18/0/0)
sent synca inc 4 lvl 5 (4,18/0/0)
All grantable enqueues granted
sent syncr inc 4 lvl 6 to 0 (4,20/0/0)
sent synca inc 4 lvl 6 (4,20/0/0)
Submitted all GCS cache requests
sent syncr inc 4 lvl 7 to 0 (4,22/0/0)
sent synca inc 4 lvl 7 (4,22/0/0)
Post SMON to start 1st pass IR
Fix write in gcs resources
sent syncr inc 4 lvl 8 to 0 (4,24/0/0)
sent synca inc 4 lvl 8 (4,24/0/0)
*** 2011-07-09 16:41:31.006
Reconfiguration complete
*** 2011-07-09 17:32:33.682
kjxgmpoll reconfig bitmap: 0
*** 2011-07-09 17:32:33.745
kjxgmrcfg: Reconfiguration started, reason 1
kjxgmcs: Setting state to 4 0.
*** 2011-07-09 17:32:34.157
Name Service frozen
kjxgmcs: Setting state to 4 1.
kjxgrssvote: reconfig bitmap chksum 0x6668604e cnt 1 master 0 ret 0
kjxggpoll: change poll time to 50 ms
*** 2011-07-09 17:32:34.464
Obtained RR update lock for sequence 5, RR seq 4
*** 2011-07-09 17:32:37.539
Voting results, upd 0, seq 6, bitmap: 0
CGS/IMR TIMEOUTS:
CSS recovery timeout = 71 sec
IMR Reconfig timeout = 300 sec
CGS rcfg timeout = 300 sec
kjxgmps: proposing substate 2
kjxgmcs: Setting state to 6 2.
kjfmSendAbortInstMsg: send an abort message to node 1
kjfmSendAbortInstMsg: unique id 0x0 reason 0x1
kjfmuin: bitmap 0
kjfmmhi: received msg from 0 (inc 2)
Performed the unique instance identification check
kjxgmps: proposing substate 3
kjxgmcs: Setting state to 6 3.
Name Service recovery started
Deleted all dead-instance name entries
kjxgmps: proposing substate 4
kjxgmcs: Setting state to 6 4.
Multicasted all local name entries for publish
Replayed all pending requests
kjxgmps: proposing substate 5
kjxgmcs: Setting state to 6 5.
Name Service normal
Name Service recovery done
*** 2011-07-09 17:32:37.598
kjxgmps: proposing substate 6
kjxgmcs: Setting state to 6 6.
kjxggpoll: change poll time to 600 ms
kjfmact: call ksimdic on instance (1)
*** 2011-07-09 17:32:37.843
kjfcrfg: DRM window size = 128->128 (min lognb = 10)
*** 2011-07-09 17:32:37.845
Reconfiguration started (old inc 4, new inc 6)
Synchronization timeout interval: 900 sec
List of nodes:
0
Undo tsn affinity 1
*** 2011-07-09 17:32:37.906
Global Resource Directory frozen
node 0
asby init, 0/0/x2
asby returns, 0/0/x2/false
* Domain maps before reconfiguration:
* DOMAIN 0 (valid 1): 0 1
* End of domain mappings
* kjbdomrcfg2: domain 0 invalid = TRUE
* Domain maps after recomputation:
* DOMAIN 0 (valid 0): 0
* End of domain mappings
Active Sendback Threshold = 50 %
Communication channels reestablished
sent syncr inc 6 lvl 1 to 0 (6,5/0/0)
sent syncr inc 6 lvl 2 to 0 (6,7/0/0)
Master broadcasted resource hash value bitmaps
Non-local Process blocks cleaned out
Set master node info
sent syncr inc 6 lvl 3 to 0 (6,13/0/0)
Submitted all remote-enqueue requests
sent syncr inc 6 lvl 4 to 0 (6,15/0/0)
Dwn-cvts replayed, VALBLKs dubious
sent syncr inc 6 lvl 5 to 0 (6,18/0/0)
All grantable enqueues granted
sent syncr inc 6 lvl 6 to 0 (6,20/0/0)
*** 2011-07-09 17:32:39.351
Post SMON to start 1st pass IR
Submitted all GCS cache requests
sent syncr inc 6 lvl 7 to 0 (6,22/0/0)
Fix write in gcs resources
sent syncr inc 6 lvl 8 to 0 (6,24/0/0)
*** 2011-07-09 17:32:39.673
Reconfiguration complete
* domain 0 valid?: 0
kjxgfipccb: msg 0x0xb7db2a6c, mbo 0x0xb7db2a68, type 19, ack 0, ref 0, stat 34
本文详细解析了Oracle RAC集群加入新节点的过程,涉及Clusterware层的NodeApps资源操作,如VIP、Listener等,并重点介绍了RAC层的Node Management、Cluster Group Service和Instance Membership Recovery机制,以及在节点加入、故障检测和重构中的关键步骤和心跳机制。
2744

被折叠的 条评论
为什么被折叠?



