2种方法发现一个索引是否碎片过多，而需要rebuild

最新推荐文章于 2026-03-26 02:55:14 发布

原创最新推荐文章于 2026-03-26 02:55:14 发布 · 1.7k 阅读

1 ·

本内容遵循CC 4.0 BY-SA版权协议

标签

#structure #statistics #branch #documentation #allocation

ORACLE 管理专栏收录该内容

7 篇文章

订阅专栏

本文介绍了两种检测Oracle数据库中索引碎片过多的方法：通过DBA_INDEXES和INDEX_STATS视图分析。当BLEVEL大于4或者通过INDEX_STATS计算的最优高度与实际高度相差较大时，可能需要考虑重建索引。重建索引需权衡在线与离线操作的影响，高碎片率可能导致性能下降，应及时处理。

有2种方法发现一个索引是否碎片过多，而需要rebuild：
方法1：
下面的视图中关于索引的统计信息：
1. DBA_INDEXES - for non partitioned indexes and aggregate information for all the partitions.
2. DBA_IND_PARTITIONS - identical to above view, but provides information about individual partitions.
从碎片的观点来看，比较重要的列为：
BLEVEL - the btree level - 0 based
LEAF_BLOCKS - Number of leaf blocks in the tree
NUM_ROWS - number of rows in the tree
SAMPLE_SIZE - what sampling did we do - provides level of confidence
不幸的是，在视图中没有提供average row size。我们不得不根据在索引中的列计算average row size以及使用该值来得到索引的利用率。
utilization_factor =
(NUM_ROWS*average_row_size)/(LEAF_BLOCKS*usable_block_size) where usable_block_size = database block size - standard overhead (~ <= 100 bytes).

注意：analyze index index_name validate structure;并不会使dba_indexes中的BLEVEL列有值，而只有compute statistics;  才会。

我们利用DBA_INDEXES进行碎片的估计时，只能根据BLEVEL列来进行大体的判断，认为BLEVEL>4的索引就应该进行考虑其碎片问题，如果真的发现是有碎片问题则需要对索引进行rebuild，否则如果是因为数据过多造成的则建议将索引分区。
下面是用的sql语句：
select index_name, blevel,
      decode(blevel,0,'OK BLEVEL',1,'OK BLEVEL',
      2,'OK BLEVEL',3,'OK BLEVEL',4,'OK BLEVEL','BLEVEL HIGH') OK
from dba_indexes
where owner='SCOTT'
order by blevel;

利用该种方法时，不是很准确，但是可以在对索引进行分析时不影响程序的正常运行。

方法2：
有另外一个视图(INDEX_STATS)提供了关于索引树的更详细的信息，但是需要DBA运行INDEX VALIDATE STRUCTURE命令，该命令会产生INDEX_STATS视图并使表处于只读状态。INDEX_STATS视图中只容纳一行数据，即只存放本次分析的结果。

在INDEX_STATS视图中我们感兴趣的列为：
br_rows - number of rows in all the branch blocks
br_rows_len - sum of the lengths of all the rows in branch blocks
br_blks - branch blocks  如果该列为0，则说明索引的索引数据在root block中
br_blk_len - usable space in a branch block (block size - overhead)
lf_blks – leaf blocks 如果该列为0，则说明索引的索引数据在root block中
lf_blk_len - usable space in a leaf block (block size - overhead)
lf_rows - number of leaf rows 包含表中现有与已经删除的的但是在index中还存在的entrys
lf_rows_len - sum of the lengths of all the rows in leaf blocks
del_lf_rows - number of deleted rows still present in leaf blocks
del_lf_rows_len - sum of the lengths of all deleted rows in leaf blocks
height - current height of the tree
注意：lf_rows - del_lf_rows是表中现有的行数。
如何利用上面视图中提供的信息计算索引树的最优高度。
Let rows_per_leaf_block = lf_blk_len/((lf_rows_len - del_lf_rows_len)/(lf_rows - del_lf_rows)) 。该值应该等于(lf_rows - del_lf_rows)/ lf_blks
Even though the number of rows in all branch blocks is not the same and this number typically tends to increase as we get closer to the root block, the following is a reasonable estimate of the expected fanout of the branch block. Let fanout =br_blk_len/(br_rows_len/br_rows)
The log (base fanout) ((lf_rows - del_lf_rows)/rows_per_leaf_blocks) is the expected number of branch levels. This plus 1 provides the optimal (based on many average assumptions) height of the tree. Comparing this to height of tree from above view tells us whether we shall reduce a level.
下面的sql语句给出是否一个index应该被rebuild，如果CAN_REDUCE_LEVEL列的值为yes，则该索引应该被rebuild：
SELECT name NAME, partition_name PARTITION_NAME,
(br_rows_len*100)/(br_blk_len*br_blks) BRANCH_UTILIZATION,
((lf_rows_len - del_lf_rows_len)*100)/(lf_blk_len*lf_blks) LEAF_UTILIZATION,
decode (SIGN(ceil(log(br_blk_len/(br_rows_len/br_rows),
lf_blk_len/((lf_rows_len - del_lf_rows_len)/(lf_rows - del_lf_rows))))
+ 1 - height), -1, 'YES', 'NO') CAN_REDUCE_LEVEL
from INDEX_STATS;

备注：
CAN_REDUCE_LEVEL列值的计算复杂，只管使用即可，无需直到为什么这样计算

当然也可以使用下面更简单的方法：
select DEL_LF_ROWS*100/decode(LF_ROWS, 0, 1, LF_ROWS) PCT_DELETED,
(LF_ROWS-DISTINCT_KEYS)*100/decode(LF_ROWS,0,1,LF_ROWS) DISTINCTIVENESS
from index_stats

备注：
The PCT_DELETED column shows what percent of leaf entries (index entries) have been deleted and remain unfilled. The more deleted entries exist on an index, the more unbalanced the index becomes. If the PCT_DELETED is 20% or higher, the index is candidate for rebuilding.  If you can afford to rebuild indexes more frequently, then do so if the value is higher than 10%. Leaving indexes with high PCT_DELETED without rebuild might cause excessive redo allocation on some systems.

The DISTINCTIVENESS column shows how often a value for the column(s) of the index is repeated on average. For example, if a table has 10000 records and 9000 distinct  SSN values, the formula would result in (10000-9000) x 100 / 10000 = 10. This shows a good distribution of values.(对于b-tree index该值越小越好，否则建议采用bitmap indexes)

The output of analyze index < name> validate structure was never used by the optimizer for any kind of decision regarding the best path.
.
The analyze ... validate structure was and is used by customers for finding unbalanced indexes or index which have e.g. a high number of deleted leaf rows. An unbalanced index can have a significant impact on the performance.

The main problem with analyze ... validate structure is/was, that this command is locking the table and therefore it could not be used for large indexes or heavily used tables while users are working.
.
With 9.x, a lot of ONLINE features were added the kernel (including validate structure). It could not be, to declare this as a documentation bug. Either the ONLINE feature is available (and has then to work) or the ONLINE feature is not available (and then a syntax error should occur).
.
To process the statement and not to give a result could not be the right behaviour.

问题：
ANALYZE INDEX VALIDATE STRUCTURE ONLINE DOES NOT POPULATE INDEX_STATS OR INDEX_HISTOGRAM WHILE WITHOUT ONLINE CLAUSE IT DOES POPULATE.

Testcase
--------
create a test table. Create an index on a column.
analyze index validate structure online;

select * from index_stats;
no rows

select * from index_histogram;
no rows

Now:
analyze index validate structure ;

select * from index_stats;
you see rows

select * from index_histogram;
you see rows

Cause
This is an expected behaviour.
Since the statistics collected by a regular analyze index validate structure are not used by the optimizer and for performance reason, the statistics collection are omitted entirely in this call path.
Fix
Use analyze index validate structure offline;
or analyze index validate structure ; to generate statistics

该种方法的缺点就是在对索引进行分析时，会对表进行锁定，使之不能对其进DML操作，这对于24X7的应用来说，是很讨厌的事情，特别是大的表与频繁修改的表。

发现需要rebuild的表后，剩下的工作就是将其进行rebuild了，rebuild offline当然好，但是在rebuild时，不能对对应的表进行dml操作，这是业务不允许的。如果进行rebuild online，通过测试发现需要的时间太长，谢谢xzh2000提供的"rebuild online compute statistics"，我将对其进行测试，。