【生信】R语言在RNA-seq中的应用

最新推荐文章于 2026-04-15 20:55:22 发布

原创

最新推荐文章于 2026-04-15 20:55:22 发布 · 4.2k 阅读

文章详细阐述了R语言在RNA-seq数据分析中的应用，包括生成工作流环境、读取和处理数据、质量过滤和修剪、比对、读段计数、样本间相关性分析、差异表达分析、GO富集分析以及层次聚类和热图绘制等步骤，涵盖了从实验数据处理到差异表达基因鉴定的全过程。

R语言在RNA-seq中的应用

生成工作流环境

之后代码运行可能会有网络问题，通过set_config函数设置即可。

.libPaths("D:/000大三下/R语言/实验/Lab4/lab4packages")
library(httr)
set_config(
  use_proxy(url="127.0.0.1", port=7890)
)
library(systemPipeR)
library(systemPipeRdata)
#####################################################
## 1. Generate workflow environment
#####################################################
setwd(choose.dir())
genWorkenvir(workflow = "rnaseq")
setwd("rnaseq")

读取和处理数据

由targets文件提供实验定义

读取和预处理实验数据。具体步骤如下：

首先，使用system.file函数找到targets.txt文件的路径，这个文件包含了所有的FASTQ文件和样本比较的信息。
然后，使用read.delim函数读取targets.txt文件，忽略以#开头的注释行，并只保留前四列。
最后，打印出targets对象，查看数据的结构和内容。

## 2. Read preprocessing
#####################################################
## 2.1 Experiment definition provided by targets file
## The targets file defines all FASTQ files and sample comparisons 
## of the analysis workflow.
targetspath <- system.file("extdata", "targets.txt", package = "systemPipeR")
targets <- read.delim(targetspath, comment.char = "#")[, 1:4]
targets

对实验数据进行质量过滤和修剪

对实验数据进行质量过滤和修剪。具体步骤如下：

首先，使用loadWorkflow函数从cwl和yml参数文件以及targets文件中构建一个SYSargs2对象，这个对象包含了执行trimLRPatterns函数的所有参数和输入输出路径。
然后，使用renderWF函数根据targets文件中的文件名和样本名替换cwl和yml文件中的占位符，生成一个完整的工作流对象。
接着，打印出trim对象，查看工作流的各个组成部分。
最后，使用output函数查看输出路径中的前两个修剪后的FASTQ文件。

## 2.2 Read quality filtering and trimming
## The function preprocessReads allows to apply predefined or custom read
## preprocessing functions to all FASTQ files referenced in a SYSargs2 container, such
## as quality filtering or adapter trimming routines. The paths to the resulting output
## FASTQ files are stored in the output slot of the SYSargs2 object. The following
## example performs adapter trimming with the trimLRPatterns function from the
## Biostrings package. After the trimming step a new targets file is generated (here
## targets_trim.txt) containing the paths to the trimmed FASTQ files. The new
## targets file can be used for the next workflow step with an updated SYSargs2
## instance, e.g. running the NGS alignments using the trimmed FASTQ files.
## First,we construct SYSargs2 object from cwl and yml param and targets files.  
dir_path <- system.file("extdata/cwl/preprocessReads/trim-se", 
                        package = "systemPipeR")
trim <- loadWorkflow(targets = targetspath, wf_file = "trim-se.cwl", 
                     input_file = "trim-se.yml", dir_path = dir_path)
trim <- renderWF(trim, inputvars = c(FileName = "_FASTQ_PATH1_", 
                                     SampleName = "_SampleName_"))
trim
output(trim)[1:2]

生成FASTQ质量报告

生成FASTQ质量报告。具体步骤如下：

首先，使用seeFastq函数对trim对象中的输入文件进行质量分析，计算每个文件的碱基质量分布、序列长度分布、GC含量分布和k-mer频率分布。
然后，使用pdf函数创建一个PDF文件，用于保存质量报告的图形。
接着，使用seeFastqPlot函数绘制质量报告的图形，包括每个文件的四个子图。
最后，使用dev.off函数关闭PDF设备，完成图形的保存。

## 2.3 FASTQ quality report
fqlist <- seeFastq(fastq = infile1(trim), batchsize = 10000, 
                   klength = 8)
pdf("./results/fastqReport.pdf", height = 18, width = 4 * length(fqlist))
seeFastqPlot(fqlist)
dev.off()

[外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传(img-8o8Xh8q9-1685677405310)(D:/000%E5%A4%A7%E4%B8%89%E4%B8%8B/R%E8%AF%AD%E8%A8%80/%E5%AE%9E%E9%AA%8C/Lab4/rnaseq/results/fastqReport.png)]

比对

建立HISAT2索引并比对

使用HISAT2进行短读比对的步骤。主要包括以下几个步骤：

构建HISAT2索引：首先，代码中使用loadWorkflow函数加载了一个工作流程对象（idx），并指定了索引构建的相关参数。然后，通过调用renderWF函数渲染工作流程对象，并通过cmdlist函数获取相应的命令列表。最后，使用runCommandline函数运行命令列表来构建HISAT2索引。
进行映射：接下来，代码中使用loadWorkflow函数加载了另一个工作流程对象（args），并指定了映射的相关参数。通过调用renderWF函数渲染工作流程对象，将文件路径和样本名称作为输入变量进行替换。然后，通过调用cmdlist函数获取命令列表，以及通过output函数获取输出结果的相关信息。
运行命令行模式：在代码中使用runCommandline函数来运行命令列表，以进行短读比对。
处理输出文件：在代码中使用output_update函数来修改args对象，以模拟对生成的对齐结果文件进行处理。具体操作是将输出文件的后缀名修改为".sam"和".bam"，并将文件路径中的目录设置为FALSE，以方便后续处理。
检查生成的BAM文件：最后，代码中使用subsetWF函数选择输出结果中的BAM文件路径，并通过file.exists函数检查这些文件是否存在。

## 3.1 Read mapping with HISAT2 
## The following steps will demonstrate how to use the short read aligner Hisat2 (Kim,
## Langmead, and Salzberg 2015) in both interactive job submissions and batch
## submissions to queuing systems of clusters using the systemPipeR's new CWL
## command-line interface.
## First, build HISAT2 index. (Skip this step)
#dir_path <- system.file("extdata/cwl/hisat2/hisat2-idx", package = "systemPipeR")
#idx <- loadWorkflow(targets = NULL, wf_file = "hisat2-index.cwl", 
#                    input_file = "hisat2-index.yml", dir_path = dir_path)
#idx <- renderWF(idx)
#idx
#cmdlist(idx)
#runCommandline(idx, make_bam = FALSE)

## Second, mapping.
dir_path <- system.file("extdata/cwl/hisat2", package = "systemPipeR")
args <- loadWorkflow(targets = targetspath, wf_file = "h

标签