问题描述:
[root@ceph-mon01 ~]# ceph -s
cluster:
id: 92d4f66b-94a6-4c40-8941-734f3c44eb4f
health: HEALTH_ERR
1 filesystem is offline
1 filesystem is online with fewer MDS than max_mds
1 pools have many more objects per pg than average
Reduced data availability: 256 pgs inactive
services:
mon: 3 daemons, quorum ceph-mon01,ceph-mon03,ceph-mon02 (age 5d)
mgr: ceph-mon03(active, since 5d), standbys: ceph-mon02, ceph-mon01
mds: cephfs:0
osd: 9 osds: 9 up (since 43h), 9 in (since 43h); 224 remapped pgs
rgw: 1 daemon active (ceph-mon01)
task status:
data:
pools: 9 pools, 480 pgs
objects: 34.60k objects, 8.5 GiB
usage: 128 GiB used, 142 GiB / 270 GiB avail
172995/103797 objects misplaced (166.667%)
256 unknown
224 active+clean+remapped
解决过程
ceph health detail
...
PG_AVAILABILITY Reduced data availability: 1024 pgs inactive
pg 4.3c8 is stuck inactive for 246794.767182, current state unknown, last acting []
pg 4.3ca is stuck inactive for 246794.767182, current state unknown, last acting []
1、检查 osd tree (本处有,datacenter0, default 两个pg副本入口点)
[root@ceph-mon01 ~]# ceph osd tree
ID CLASS WEIGHT TYPE NAME STATUS REWEIGHT PRI-AFF
-9 0.26367 datacenter datacenter0
-10 0.26367 room room0 &n

文章描述了在Ceph集群出现HEALTH_ERR状态,包括文件系统离线、数据可用性降低等问题时,如何通过检查OSD树、CRUSH映射、池的CRUSH规则,并修改CRUSH映射以优化数据分布,最终恢复pgs的正常状态。
1704

被折叠的 条评论
为什么被折叠?



