在差异比较分析中,柱状图加误差线是最常见的可视化方式,但其信息量有限——仅呈现均值与标准差,无法反映数据分布的全貌。而箱线图(boxplot)通过五个统计量(最小值、下四分位数、中位数、上四分位数、最大值),可以同时判断数据是否偏斜、各组离散度差异、是否存在离群值,以及组间中位数的差异程度。这一可视化方法在生态学、环境科学等领域的期刊中几乎是标配。
本文以植物多样性数据为例,从基础箱线图出发,逐步叠加美化与统计标注,最终达到期刊发表标准。
一、认识你的数据
演示数据为模拟的生态学多样性调查数据,共 7 个样地,分别在春、夏、秋季对林地、草地、农田进行植物调查。

各字段含义如下:
| 字段 | 说明 | 示例值 |
|---|---|---|
| Plot | 样地编号 | For-Sp01 |
| Season | 采样季节 | Spring, Summer, Autumn |
| Habitat | 生境类型 | Forest, Grassland, Farmland |
| Shannon | Shannon 多样性指数 | 2.5–3.6 |
| Simpson | Simpson 优势度指数 | 0.7–0.96 |
读取数据并设置因子顺序:
df <- read.csv("ecology_diversity_data.csv", header = TRUE)
df$Season <- factor(df$Season, levels = c("Spring", "Summer", "Autumn"))
df$Habitat <- factor(df$Habitat, levels = c("Forest", "Grassland", "Farmland"))
二、基础箱线图
R 中 ggplot2 包提供了绘制箱线图的核心函数 geom_boxplot(),可自动计算每组的中位数、四分位数及触须范围。
library(ggplot2)
ggplot(df, aes(x = Season, y = Shannon)) +
geom_boxplot()
默认参数下输出为灰色填充、黑色边框的箱线图。在此基础上可以逐步叠加美化参数:
p_base <- ggplot(df, aes(x = Season, y = Shannon)) +
geom_boxplot() +
labs(x = "Season", y = "Shannon index")
p_base

三、美化:配色与 jitter 散点
配色方案
本文采用以下配色组合:
| Season | 颜色 | 色号 |
|---|---|---|
| Spring | 蓝绿色 | #00AFBB |
| Summer | 金黄色 | #E7B800 |
| Autumn | 橙红色 | #FC4E07 |
jitter 散点叠加
在箱线图上叠加 jitter 散点可展示每个观测值的实际位置,避免箱线图掩盖数据内部的分布细节。使用 ggpubr::ggboxplot 让代码更简洁:
library(ggpubr)
p_beauty <- ggboxplot(df, x = "Season", y = "Shannon",
color = "Season",
palette = c("#00AFBB", "#E7B800", "#FC4E07"),
xlab = "Season", ylab = "Shannon index",
add = "jitter", add.params = list(size = 1))
p_beauty

四、添加统计检验
4.1 检验方法选择
多组均值比较的标准方法为 ANOVA + Tukey HSD,但该检验要求数据满足正态分布与方差齐性。生态学数据(特别是物种丰富度等计数数据)常不满足上述假设。
非参数的 Kruskal-Wallis 检验不依赖分布假设,仅比较秩次,适用于更广泛的数据类型。若 Kruskal-Wallis 检验显著,进一步采用两两 t 检验判断具体组间差异。
# 全局检验
compare_means(Shannon ~ Season, data = df, method = "kruskal.test")
4.2 图形化标注
ggpubr::stat_compare_means() 可在图形上直接标注 p 值。将绘图逻辑封装为函数以便后续复用:
# 定义两两比较组
groups_season <- list(c("Spring", "Summer"),
c("Spring", "Autumn"),
c("Summer", "Autumn"))
plot_single_factor <- function(yvar, ylabname) {
ggboxplot(df, x = "Season", y = yvar,
color = "Season",
palette = c("#00AFBB", "#E7B800", "#FC4E07"),
xlab = "Season", ylab = ylabname,
add = "jitter", add.params = list(size = 1)) +
stat_compare_means(comparisons = groups_season,
method = "t.test",
label = "p.format",
hide.ns = TRUE) +
theme_clean
}
参数说明:
hide.ns = TRUE:仅标注显著的比较结果(FALSE则在非显著比较处标注 “ns”)label = "p.format":显示精确 p 值;改为"p.signif"则显示为星号(*、**、***)
向内刻度线是发表级图表常用的细节处理:
theme_clean <- theme(
legend.position = "none",
axis.ticks.length = unit(-0.1, "cm"), # 向内延伸(负号=向内)
axis.text.x = element_text(margin = margin(t = 4)),
axis.text.y = element_text(margin = margin(l = 4))
)
注意:
axis.ticks.length = unit(-0.1, "cm")使刻度线向内,该设置在 ecology 类期刊中较为普遍。
调用函数生成带统计标注的单因素箱线图:
p3_stat <- plot_single_factor("Shannon", "Shannon index")
p3_stat

五、双因素箱线图:季节 × 生境
在单因素分析基础上,进一步考察生境类型是否影响多样性的季节变化模式。
5.1 组合变量
双因素图需同时展示 Season 与 Habitat 两个分组变量。将二者拼接为组合变量 Season_Habitat,可使 x 轴直接容纳 9 个分组(3 季节 × 3 生境):
df$Season_Habitat <- factor(paste(df$Season, df$Habitat, sep = "_"),
levels = c("Spring_Forest", "Spring_Grassland", "Spring_Farmland",
"Summer_Forest", "Summer_Grassland", "Summer_Farmland",
"Autumn_Forest", "Autumn_Grassland", "Autumn_Farmland"))
5.2 配色规则
每种生境类型分配固定颜色,跨季节保持一致:
habitat_colors <- c("#1B9E77", "#D95F02", "#7570B3")
# 在 ggboxplot 中重复 3 次以覆盖各季节
palette = rep(habitat_colors, 3)
5.3 显著性字母标注
9 个分组若使用方括号加线条标注两两比较结果,图形将过于拥挤。替代方案为显著性字母标注法(compact letter display)。
共享同一字母的组间差异不显著,无共同字母的组间差异显著。实现分三步:
pairwise_t_test计算所有 36 组两两比较的 p 值multcompLetters将 p 值矩阵转换为字母分组geom_text将字母标注于箱体上方
library(export)
library(rstatix)
library(multcompView)
generate_letters <- function(varname) {
pw <- df %>%
pairwise_t_test(as.formula(paste(varname, "~ Season_Habitat")),
p.adjust.method = "none")
pvals <- pw$p
names(pvals) <- paste(pw$group1, pw$group2, sep = "-")
letters <- multcompView::multcompLetters(pvals)
temp <- data.frame(
Season_Habitat = names(letters$Letters),
label = letters$Letters,
stringsAsFactors = FALSE
)
temp$Season_Habitat <- factor(temp$Season_Habitat, levels = levels(df$Season_Habitat))
return(temp)
}
plot_two_factor <- function(varname, ylabname) {
letters_df <- generate_letters(varname)
ggboxplot(df, x = "Season_Habitat", y = varname,
color = "Season_Habitat",
palette = rep(habitat_colors, 3),
xlab = "", ylab = ylabname,
add = "jitter", add.params = list(size = 1)) +
geom_text(data = letters_df,
aes(x = Season_Habitat, y = max(df[[varname]]) * 1.05, label = label),
vjust = 0) +
theme_clean +
theme(axis.text.x = element_text(angle = 45, hjust = 1))
}
其中 angle = 45 旋转 x 轴标签以避免文字重叠;max(df[[varname]]) * 1.05 计算字母标注位置。
调用函数生成双因素箱线图:
q1 <- plot_two_factor("Shannon", "Shannon index")
q1

六、双因素图结果解读
显著性字母的判读原则:两组之间若存在至少一个相同字母,则差异不显著;若完全无共同字母,则差异显著。
从模拟数据趋势来看:
- Forest 在各季节中多样性指数最高,且季节间波动幅度较小(字母分组跨季节变化有限)
- Grassland 多样性居中,秋季下降趋势较明显
- Farmland 多样性最低,且季节间波动幅度最大,可能表明受人为干扰较强的生境对季节性气候变化的缓冲能力较弱
单因素图与双因素图回答不同层次的问题:单因素图判断"季节间是否存在显著差异";双因素图进一步判断"季节差异的模式是否因生境类型而异"。二者互补而非重复。
七、多指标拼图与多格式导出
将 Shannon 与 Simpson 两个指标的箱线图拼接为综合 Figure,便于期刊排版与读者对比。
7.1 单因素拼图
library(cowplot)
p1 <- plot_single_factor("Shannon", "Shannon index")
p2 <- plot_single_factor("Simpson", "Simpson index")
p_final <- plot_grid(p1, p2, ncol = 2)
tiff("boxplot_publication.tif", width = 3100, height = 1600,
pointsize = 8, res = 280, compression = "lzw")
print(p_final); dev.off()
png("boxplot_publication.png", width = 3100, height = 1600, res = 280)
print(p_final); dev.off()
graph2ppt(p_final, file = "boxplot_publication.pptx")
7.2 双因素拼图
q1 <- plot_two_factor("Shannon", "Shannon index")
q2 <- plot_two_factor("Simpson", "Simpson index")
q_final <- plot_grid(q1, q2, ncol = 2)
tiff("boxplot_two_way_comb_publication.tif", width = 3400, height = 1600,
pointsize = 8, res = 280, compression = "lzw")
print(q_final); dev.off()
png("boxplot_two_way_comb_publication.png", width = 3400, height = 1600,
res = 280)
print(q_final); dev.off()
graph2ppt(q_final, file = "boxplot_two_way_comb_publication.pptx")

其中 export::graph2ppt() 的PPT格式图保留全部矢量信息,导出后可在PPT中直接编辑标签、颜色与大小,非常推荐。
八、完整代码
以下为本文全部代码,复制后修改 read.csv() 中的文件名为你的数据,并修改后面的变量/分组等即可运行。
# ===================== 加载包 =====================
library(ggplot2)
library(ggpubr)
library(rstatix)
library(multcompView)
library(cowplot)
library(export)
library(dplyr)
# ===================== 全局设置 =====================
df <- read.csv("ecology_diversity_data.csv", header = TRUE)
df$Season <- factor(df$Season, levels = c("Spring", "Summer", "Autumn"))
df$Habitat <- factor(df$Habitat, levels = c("Forest", "Grassland", "Farmland"))
season_colors <- c("#00AFBB", "#E7B800", "#FC4E07")
habitat_colors <- c("#1B9E77", "#D95F02", "#7570B3")
theme_clean <- theme(
legend.position = "none",
axis.ticks.length = unit(-0.1, "cm"),
axis.text.x = element_text(margin = margin(t = 4)),
axis.text.y = element_text(margin = margin(l = 4))
)
groups_season <- list(c("Spring", "Summer"),
c("Spring", "Autumn"),
c("Summer", "Autumn"))
# ===================== 单因素绘图函数 =====================
plot_single_factor <- function(yvar, ylabname) {
ggboxplot(df, x = "Season", y = yvar,
color = "Season",
palette = season_colors,
xlab = "Season", ylab = ylabname,
add = "jitter", add.params = list(size = 1)) +
stat_compare_means(comparisons = groups_season, method = "t.test",
label = "p.format", hide.ns = TRUE) +
theme_clean
}
# ===================== 双因素准备:组合变量 =====================
df$Season_Habitat <- factor(paste(df$Season, df$Habitat, sep = "_"),
levels = c("Spring_Forest", "Spring_Grassland", "Spring_Farmland",
"Summer_Forest", "Summer_Grassland", "Summer_Farmland",
"Autumn_Forest", "Autumn_Grassland", "Autumn_Farmland"))
# ===================== 双因素显著性字母函数 =====================
generate_letters <- function(varname) {
pw <- df %>%
pairwise_t_test(as.formula(paste(varname, "~ Season_Habitat")),
p.adjust.method = "none")
pvals <- pw$p
names(pvals) <- paste(pw$group1, pw$group2, sep = "-")
letters <- multcompView::multcompLetters(pvals)
temp <- data.frame(
Season_Habitat = names(letters$Letters),
label = letters$Letters,
stringsAsFactors = FALSE
)
temp$Season_Habitat <- factor(temp$Season_Habitat, levels = levels(df$Season_Habitat))
return(temp)
}
# ===================== 双因素绘图函数 =====================
plot_two_factor <- function(varname, ylabname) {
letters_df <- generate_letters(varname)
ggboxplot(df, x = "Season_Habitat", y = varname,
color = "Season_Habitat",
palette = rep(habitat_colors, 3),
xlab = "", ylab = ylabname,
add = "jitter", add.params = list(size = 1)) +
geom_text(data = letters_df,
aes(x = Season_Habitat, y = max(df[[varname]]) * 1.05, label = label),
vjust = 0) +
theme_clean +
theme(axis.text.x = element_text(angle = 45, hjust = 1))
}
# ===================== 单因素:Shannon + Simpson 拼图 =====================
p1 <- plot_single_factor("Shannon", "Shannon index")
p2 <- plot_single_factor("Simpson", "Simpson index")
p_final <- plot_grid(p1, p2, ncol = 2,
labels = c("(a)", "(b)"), label_x = 0)
tiff("boxplot_publication.tif", width = 3100, height = 1600,
pointsize = 8, res = 280, compression = "lzw")
print(p_final); dev.off()
png("boxplot_publication.png", width = 3100, height = 1600, res = 280)
print(p_final); dev.off()
graph2ppt(p_final, file = "boxplot_publication.pptx")
# ===================== 双因素:Shannon + Simpson 拼图 =====================
q1 <- plot_two_factor("Shannon", "Shannon index")
q2 <- plot_two_factor("Simpson", "Simpson index")
q_final <- plot_grid(q1, q2, ncol = 2,
labels = c("(a)", "(b)"), label_x = 0)
tiff("boxplot_two_way_comb_publication.tif", width = 3400, height = 1600,
pointsize = 8, res = 280, compression = "lzw")
print(q_final); dev.off()
png("boxplot_two_way_comb_publication.png", width = 3400, height = 1600,
res = 280)
print(q_final); dev.off()
graph2ppt(q_final, file = "boxplot_two_way_comb_publication.pptx")
九、适配自有数据
将 read.csv() 中的文件名替换为你的数据文件,并对应修改 x、y、color/fill 的变量映射即可:
- 单因素图:替换分组变量与响应变量的列名
- 双因素图:同步修改组合变量的拼接方式及配色方案
若需自定义配色,ggsci 包提供的 pal_npg()、pal_aaas() 及 pal_lancet() 等方案可直接调用。
十、要点总结
- 配色分离:Season 与 Habitat 使用独立调色板,避免变量混淆
- 先全局检验后两两比较:Kruskal-Wallis 判定整体差异,t 检验定位具体组间差异
- 双因素用字母标注:分组较多时,显著性字母优于方括号线条的可读性
- jitter 叠加:展示数据分布细节;数据量过大时可调整
width参数 - 矢量 PPT 导出:
graph2ppt()保留可编辑矢量信息,便于美化微调
如果对你有帮助,欢迎 点赞 👍、收藏 ⭐、转发 📤。
9万+

被折叠的 条评论
为什么被折叠?



