Evaluate the Malignancy of Pulmonary Nodules Using the 3D Deep Leaky Noisy-or Network【翻译】

最新推荐文章于 2026-07-03 17:05:11 发布

原创最新推荐文章于 2026-07-03 17:05:11 发布 · 779 阅读

本内容遵循CC 4.0 BY-SA版权协议

Doc2X：表格解析与翻译一体化工具
从 PDF 中快速提取表格并翻译为多语言格式，支持 Word、HTML 输出。
Doc2X: Table Parsing and Translation Integrated Tool
Quickly extract tables from PDFs and translate them into multilingual formats, with output options for Word and HTML.
👉 访问 Doc2X 官网 | Visit Doc2X Official Site

https://arxiv.org/pdf/1711.08324v1

Evaluate the Malignancy of Pulmonary Nodules Using the 3D Deep Leaky Noisy-or Network

使用3D深度漏斗噪声或网络评估肺结节的恶性程度

Fangzhou Liao, Ming Liang, Zhe Li, Xiaolin Hu*, Senior Member, IEEE and Sen Song*

廖方舟，李明，李哲，胡晓林*，IEEE高级会员，宋森*

Abstract-Automatic diagnosing lung cancer from Computed Tomography (CT) scans involves two steps: detect all suspicious lesions (pulmonary nodules) and evaluate the whole-lung/pulmonary malignancy. Currently, there are many studies about the first step, but few about the second step. Since the existence of nodule does not definitely indicate cancer, and the morphology of nodule has a complicated relationship with cancer, the diagnosis of lung cancer demands careful investigations on every suspicious nodule and integration of information of all nodules. We propose a 3D deep neural network to solve this problem. The model consists of two modules. The first one is a 3D region proposal network for nodule detection, which outputs all suspicious nodules for a subject. The second one selects the top five nodules based on the detection confidence, evaluates their cancer probabilities and combines them with a leaky noisy-or gate to obtain the probability of lung cancer for the subject. The two modules share the same backbone network, a modified U-net. The over-fitting caused by the shortage of training data is alleviated by training the two modules alternately. The proposed model won the first place in the Data Science Boyl 2017 competition. The code has been made publicly available ${}^{1}$ .

摘要——从计算机断层扫描（CT）图像中自动诊断肺癌涉及两个步骤：检测所有可疑病变（肺结节）并评估全肺/肺的恶性程度。目前，关于第一步的研究很多，但关于第二步的研究很少。由于结节的存在并不一定表明癌症，并且结节的形态与癌症之间存在复杂的关系，因此肺癌的诊断需要对每个可疑结节进行仔细调查，并整合所有结节的信息。我们提出了一种3D深度神经网络来解决这个问题。该模型由两个模块组成。第一个是用于结节检测的3D区域建议网络，它输出受试者的所有可疑结节。第二个模块根据检测置信度选择前五个结节，评估它们的癌症概率，并使用漏斗噪声或门将它们结合起来，以获得受试者的肺癌概率。这两个模块共享相同的主干网络，即经过修改的U-net。通过交替训练两个模块来缓解由于训练数据不足导致的过拟合问题。所提出的模型在2017年数据科学男孩竞赛中获得了第一名。代码已公开发布 ${}^{1}$ 。

Index Terms-Pulmonary nodule detection, nodule malignancy evaluation, deep learning, noisy-or model, 3D convolutional neural network

索引词——肺结节检测，结节恶性评估，深度学习，噪声或模型，3D卷积神经网络

I. INTRODUCTION

I. 引言

lung cancer is early diagnosis and timely treatment. Therefore regular examinations are necessary. The volumetric thoracic Computed Tomography (CT) is a common imaging tool for lung cancer diagnosis [1]. It visualizes all tissues according to their absorption of X-ray. The lesion in the lung is called pulmonary nodules. A nodule usually has the same absorption level as the normal tissues, but has a distinctive shape: the bronchus and vessels are continuous pipe systems, thick at the root and thin at the branch, and nodules are usually spherical and isolated. It usually takes an experienced doctor around 10 minutes to perform a thorough check for a patient, because some nodules are small and hard to be found. Moreover, there are many subtypes of nodules, and the cancer probabilities of different subtypes are different. Doctors can evaluate the malignancy of nodules based on their morphology, but the accuracy highly depends on doctors’ experience, and different doctors may give different predictions [2].

肺癌的早期诊断和及时治疗至关重要。因此，定期检查是必要的。体积胸腔计算机断层扫描（CT）是肺癌诊断的常见影像工具[1]。它根据组织的X射线吸收率来可视化所有组织。肺部的病变称为肺结节。结节通常与正常组织的吸收水平相同，但具有独特的形状：支气管和血管是连续的管道系统，根部较粗，分支较细，而结节通常是球形且孤立的。通常，经验丰富的医生需要大约10分钟才能对患者进行彻底检查，因为有些结节很小且难以发现。此外，结节有许多亚型，不同亚型的癌症概率不同。医生可以根据结节的形态评估其恶性程度，但准确性高度依赖于医生的经验，不同医生可能会给出不同的预测[2]。

Computer-aided diagnosis (CAD) is suitable for this task because computer vision models can quickly scan everywhere with equal quality and they are not affected by fatigue and emotions. Recent advancement of deep learning has enabled computer vision models to help the doctors to diagnose various problems and in some cases the models have exhibited competitive performance to doctors [3, 4, 5, 6, 7].

计算机辅助诊断（CAD）适用于此任务，因为计算机视觉模型可以以相同质量快速扫描所有区域，并且不受疲劳和情绪的影响。深度学习的最新进展使得计算机视觉模型能够帮助医生诊断各种问题，并且在某些情况下，模型的表现与医生相当[3, 4, 5, 6, 7]。

Automatic lung cancer diagnosing has several difficulties compared with general computer vision problems. First, nodule detection is a 3D object detection problem which is harder than $2\mathrm{D}$ object detection. Direct generalization of $2\mathrm{D}$ object detection methods to 3D cases faces technical difficulty due to the limited GPU memory. Therefore some methods use 2D region proposal networks (RPN) to extract proposals in individual $2\mathrm{D}$ images then combine them to generate $3\mathrm{D}$ proposals [8, 9]. More importantly, labeling 3D data is usually much harder than labeling $2\mathrm{D}$ data,which may make deep learning models fail due to over-fitting. Second, the shape of the nodules is diverse (Fig. 1), and the difference between nodules and normal tissues is vague. In consequence, even experienced doctors cannot reach consensuses in some cases [10]. Third, the relationship between nodule and cancer is complicated. The existence of nodule does not definitely indicate lung cancer. For patients with multiple nodules, all nodules should be considered to infer the cancer probability. In other words, unlike the classical detection task and the classical classification task, in this task, a label corresponds to several objects. This is a multiple instance learning (MIL) [11] problem, which is a hard problem in computer vision.

与一般的计算机视觉问题相比，自动肺癌诊断存在几个困难。首先，结节检测是一个三维物体检测问题，比 $2\mathrm{D}$ 物体检测更难。直接将 $2\mathrm{D}$ 物体检测方法泛化到三维情况面临技术困难，这是由于有限的 GPU 内存。因此，一些方法使用二维区域提议网络（RPN）在单个 $2\mathrm{D}$ 图像中提取提议，然后将它们组合以生成 $3\mathrm{D}$ 提议 [8, 9]。更重要的是，标记三维数据通常比标记 $2\mathrm{D}$ 数据困难得多，这可能导致深度学习模型因过拟合而失败。其次，结节的形状多样（图1），结节与正常组织的差异模糊。因此，即使是经验丰富的医生在某些情况下也无法达成共识 [10]。第三，结节与癌症之间的关系复杂。结节的存在并不一定表明肺癌。对于有多发性结节的患者，应考虑所有结节来推断癌症概率。换句话说，与经典的检测任务和经典的分类任务不同，在这个任务中，一个标签对应于多个对象。这是一个多实例学习（MIL）[11] 问题，是计算机视觉中的一个难题。

To tackle these difficulties, we take the following strategies. We built a 3D RPN [12] to directly predict the bounding boxes for nodules. The 3D convolutional neural network (CNN) structure enables the network to capture complex features. To deal with the GPU memory problem, a patch-based training and testing strategy is used. The model is trained end-to-end to achieve efficient optimization. Extensive data augmentation is used to combat over-fitting. The threshold for the detector is set low such that all suspicious nodules are included. Then the top five suspicious nodules are selected as input to the classifier. A leaky noisy-or model [13] is introduced in the classifier to combine the scores of top five nodules.

为了解决这些困难，我们采取了以下策略。我们构建了一个3D RPN [12] 来直接预测结节的边界框。3D卷积神经网络（CNN）结构使网络能够捕捉复杂特征。为了解决GPU内存问题，采用了基于分块的训练和测试策略。模型进行了端到端的训练以实现高效优化。广泛的数据增强被用来对抗过拟合。检测器的阈值设置得很低，以确保所有可疑结节都被包含在内。然后选择前五个可疑结节作为分类器的输入。在分类器中引入了泄漏噪声或模型 [13] 来结合前五个结节的分数。

The noisy-or model is a local causal probability model commonly used in probability graph models [13]. It assumes that

噪声或模型是一种局部因果概率模型，常用于概率图模型 [13]。它假设

Fangzhou Liao, Zhe Li and Sen Song are with the School of Medicine, Tsinghua University, Beijing 100084, China.

廖方舟、李哲和宋森来自清华大学医学院，北京 100084。

Ming Liang and Xiaolin Hu are with the State Key Laboratory of Intelligent Technology and Systems, Tsinghua National Laboratory for Information Science and Technology (TNList), and Department of Computer Science and Technology, Tsinghua University, Beijing 100084, China. (email: xlhu@tsinghua.edu.cn).

梁明和小林虎来自智能技术与系统国家重点实验室，清华大学信息科学与技术国家实验室（TNList），以及清华大学计算机科学与技术系，北京 100084。（电子邮件：xlhu@tsinghua.edu.cn）。

This work was supported in part by the National Basic Research Program (973 Program) of China under grant no. 2013CB329403, the National Natural Science Foundation of China under grant nos. 91420201, 61332007, 61621136008 and 61620106010.

本工作部分得到了国家基础研究计划（973计划）中国家自然科学基金的资助，项目编号为2013CB329403，国家自然科学基金项目编号为91420201、61332007、61621136008和61620106010。

Corresponding authors
通讯作者

${}^{1}$ https://github.com/lfz/DSB2017

Fig. 1: Examples of nodules in the DSB dataset. Top: the whole slice. Bottom: the zoomed image.

图1：DSB数据集中结节的示例。顶部：整个切片。底部：放大图像。

an event can be caused by different factors, and the happening of any one of these factors can lead to the happening of the event with independent probability. One modified version of the model is called leaky noisy-or model [13], which assumes that there is a leakage probability for the event even none of the factors happens. The leaky noisy-or model is suitable for this task. First, when multiple nodules are present in a case, all nodules contribute to the final prediction. Second, a highly suspicious nodule would explain away the cancer case, which is desirable. Third, when no nodule can explain a cancer case, cancer can be attributed to a leakage probability.

一个事件可以由不同的因素引起，这些因素中的任何一个发生都可以独立概率导致事件的发生。该模型的一个修改版本称为泄漏噪声或模型 [13]，该模型假设即使没有任何因素发生，事件也有泄漏概率。泄漏噪声或模型适合此任务。首先，当一个病例中存在多个结节时，所有结节都对最终预测有贡献。其次，高度可疑的结节可以解释癌症病例，这是可取的。第三，当没有结节可以解释癌症病例时，癌症可以归因于泄漏概率。

The classification network is also a 3D neural network. To prevent over-fitting, we let the classification network share the backbone of the detection network (the parameters of the backbones of the two networks are tied) and train the two networks alternately. Extensive data augmentation are also used.

分类网络也是一个3D神经网络。为了防止过拟合，我们让分类网络共享检测网络的主干（两个网络的主干的参数是绑定的），并交替训练这两个网络。还使用了广泛的数据增强。

Our contributions in this work are summarized as follows:

我们在本工作中的贡献总结如下：

To the best of our knowledge, we propose the first volumetric one-stage end-to-end CNN for 3D object detection.
据我们所知，我们提出了第一个用于3D物体检测的体积单阶段端到端CNN。
We propose to integrate the noisy-or gate into neural networks to solve the multi-instance learning task in
我们提出将噪声或门集成到神经网络中，以解决多实例学习任务

CAD.

We validated the proposed method on the Data Science Bowl ${2017}^{2}$ and won the first place among 1972 teams.

我们在数据科学碗 ${2017}^{2}$ 上验证了所提出的方法，并在1972支队伍中获得了第一名。

The rest of the paper is organized as follows. Section II presents some closely related works. The pipeline of the proposed method is detailed in subsequent sections. It consists of three steps: (1) preprocessing (Section III): segment the lung out from other tissues; (2) detection (Section IV): find all suspicious nodules in the lung; (3) classification (Section V): score all nodules and combine their cancer probabilities to get the overall cancer probability of the patient. The first step is accomplished by classical image preprocessing techniques and the other two steps by neural networks. The results are presented in Sections VI Section VII concludes the paper with some discussions.

本文的其余部分组织如下。第二部分介绍了一些密切相关的工作。所提出方法的流程将在后续部分详细说明。它包括三个步骤：(1) 预处理（第三部分）：将肺部分离出其他组织；(2) 检测（第四部分）：在肺部找到所有可疑的结节；(3) 分类（第五部分）：对所有结节进行评分，并结合它们的癌症概率以获得患者的总体癌症概率。第一步通过经典的图像预处理技术完成，其余两步通过神经网络完成。结果将在第六部分展示，第七部分通过一些讨论总结本文。

II. RELATED WORKS

II. 相关工作

A. General object detection

A. 通用目标检测

A number of object detection methods have been proposed and a thorough review is beyond the scope of this paper. Most of these methods are designed for 2D object detection. Some state-of-the-art methods have two stages (e.g., Faster-RCNN [12]), in which some bounding boxes (called proposals) are proposed in the first stage (containing an object or not) and the class decision (which class the object in a proposal belongs to) is made in the second stage. More recent methods have a single stage, in which the bounding boxes and class probabilities are predicted simultaneously (YOLO [14]) or the class probabilities are predicted for default boxes without proposal generation (SSD [15]). In general, single-stage methods are faster but two-stage methods are more accurate. In the case of single class object detection, the second stage in the two-stage methods is no longer needed and the methods degenerate to single-stage methods.

已经提出了许多目标检测方法，全面的综述超出了本文的范围。大多数这些方法是为二维目标检测设计的。一些最先进的方法有两个阶段（例如，Faster-RCNN [12]），其中在第一阶段提出一些边界框（称为提案）（包含对象或不包含对象），在第二阶段做出类别决策（提案中的对象属于哪个类别）。更近期的方法只有一个阶段，其中边界框和类别概率同时预测（YOLO [14]）或对默认框预测类别概率而不生成提案（SSD [15]）。一般来说，单阶段方法更快，但两阶段方法更准确。在单类别目标检测的情况下，两阶段方法的第二阶段不再需要，方法退化为单阶段方法。

Extension of the cutting-edge 2D object detection methods to 3D object detections tasks (e.g., action detection in video and volumetric detection) is limited. Due to the memory constraint in mainstream GPUs, some studies use 2D RPN to extract proposals in individual 2D images then use an extra module to combine the 2D proposal into 3D proposals [8, 9]. Similar strategies have been used for 3D image segmentation [16]. As far as we know 3D RPN has not been used to process video or volumetric data.

将最先进的2D物体检测方法扩展到3D物体检测任务（例如，视频中的动作检测和体积检测）是有限的。由于主流GPU中的内存限制，一些研究使用2D RPN从单个2D图像中提取建议，然后使用额外的模块将这些2D建议组合成3D建议[8, 9]。类似的策略已用于3D图像分割[16]。据我们所知，3D RPN尚未用于处理视频或体积数据。

B. Nodule detection

B. 结节检测

Nodule detection is a typical volumetric detection task. Due to its great clinical significance, it draws more and more attention in these years. This task is usually divided into two subtasks [17]: making proposals and reducing false positives, and each subtask has attracted many researches. The models for the first subtask usually start with a simple and fast 3D descriptor then followed by a classifier to give many proposals. The models for the second subtask are usually complex classifiers. In 2010 Van Ginneken et al. [17] gave a comprehensive review of six conventional algorithms and evaluated them on the ANODE09 dataset, which contains 55 scans. During 2011-2015, a much larger dataset LIDC [18, 19, 20] was developed. Researchers started to adopt CNN to reduce the number of false positives. Setio et al. [21] adopted a multi-view CNN, and Dou et al. [22] adopted a 3D CNN to solve this problem and both achieved better results than conventional methods. Ding et al. [9] adopted 2D RPN to make nodule proposals in every slice and adopted 3D CNN to reduce the number of false-positive samples. A competition called LUng Nodule Analysis 2016 (LUNA16) [23] was held based on a selected subset of LIDC. In the detection track of this competition, most participants used the two-stage methods

结节检测是一项典型的体积检测任务。由于其在临床上的重要性，近年来受到越来越多的关注。该任务通常分为两个子任务 [17]：生成候选区域和减少假阳性，每个子任务都吸引了许多研究。第一个子任务的模型通常从一个简单且快速的 3D 描述符开始，然后通过分类器生成许多候选区域。第二个子任务的模型通常是复杂的分类器。2010 年，Van Ginneken 等人 [17] 对六种传统算法进行了全面回顾，并在包含 55 个扫描的 ANODE09 数据集上对其进行了评估。2011-2015 年期间，开发了一个更大的数据集 LIDC [18, 19, 20]。研究人员开始采用 CNN 来减少假阳性数量。Setio 等人 [21] 采用多视图 CNN，Dou 等人 [22] 采用 3D CNN 来解决这个问题，并且两者都取得了比传统方法更好的结果。Ding 等人 [9] 采用 2D RPN 在每个切片中生成结节候选区域，并采用 3D CNN 来减少假阳性样本的数量。基于 LIDC 的一个选定子集，举办了一场名为 LUng Nodule Analysis 2016 (LUNA16) [23] 的比赛。在该比赛的检测赛道中，大多数参与者使用了两阶段方法。

[23].

${}^{2}$ https://www.kaggle.com/c/data-science-bowl-2017

C. Multiple instance learning

C. 多实例学习

In MIL task, the input is a bag of instances. The bag is labeled positive if any of the instances are labeled positive and the bag is labeled negative if all of the instances are labeled negative.

在 MIL 任务中，输入是一个实例包。如果包中的任何实例被标记为阳性，则该包被标记为阳性；如果所有实例都被标记为阴性，则该包被标记为阴性。

Many medical image analysis tasks are MIL tasks, so before the rise of deep learning, some earlier works have already proposed MIL frameworks in CAD. Dundar et al. [24] introduced convex hull to represent multi-instance features and applied it to pulmonary embolism and colon cancer detection. $\mathrm{{Xu}}$ et al. [25] extracted many patches from the tissue-examing image and treated them as multi-instances to solve the colon cancer classification problem.

许多医学图像分析任务是多实例学习（MIL）任务，因此在深度学习兴起之前，一些早期工作已经在计算机辅助诊断（CAD）中提出了MIL框架。Dundar等人[24]引入了凸包来表示多实例特征，并将其应用于肺栓塞和结肠癌检测。 $\mathrm{{Xu}}$ 等人[25]从组织检查图像中提取了许多补丁，并将它们视为多实例来解决结肠癌分类问题。

To incorporate the MIL into deep neural network framework, the key component is a layer that combines the information from different instances together, which is called MIL Pooling Layer (MPL [26]). Some MPL examples are: max-pooling layer [27], mean pooling layer [26], log-sum-exp pooling layer [28], generalized-mean layer [25] and noisy-or layer [29]. If the number of instances is fixed for every sample, it is also feasible to use feature concatenation as an MPL [30]. The MPL can be used to combine different instances in the feature level [27, 28] or output level [29].

为了将MIL融入深度神经网络框架，关键组件是一个将不同实例的信息结合在一起的层，称为多实例池化层（MIL Pooling Layer，MPL [26]）。一些MPL的例子包括：最大池化层[27]、平均池化层[26]、对数求和池化层[28]、广义均值层[25]和噪声或层[29]。如果每个样本的实例数量是固定的，也可以使用特征拼接作为MPL [30]。MPL可以用于在特征级别[27, 28]或输出级别[29]结合不同的实例。

D. Noisy-or model

D. 噪声或模型

The noisy-or Bayesian model is wildly used in inferring the probability of diseases such as liver disorder [31] and asthma case [32]. Heckerman [33] built a multi-features and multi-disease diagnosing system based on the noisy-or gate. Halpern and Sontag [34] proposed an unsupervised learning method based on the noisy-or model and validated it on the Quick Medical Reference model.

噪声或贝叶斯模型广泛用于推断肝病[31]和哮喘病例[32]等疾病的概率。Heckerman[33]基于噪声或门构建了一个多特征和多疾病的诊断系统。Halpern和Sontag[34]提出了一种基于噪声或模型的无监督学习方法，并在Quick Medical Reference模型上验证了其有效性。

All of the studies mentioned above incorporate the noisy-or model into the Bayesian models. Yet the integration of the noisy-or model and neural networks is rare. Sun et al. [29] has adopted it as an MPL in the deep neural network framework to improve the image classification accuracy. And Zhang et al. [35] used it as a boosting method to improve the object detection accuracy.

上述所有研究都将噪声或模型融入贝叶斯模型中。然而，将噪声或模型与神经网络结合的研究较为罕见。Sun等人[29]将其作为深度神经网络框架中的MPL，以提高图像分类的准确性。而Zhang等人[35]将其用作一种提升方法，以提高目标检测的准确性。

Fig. 2: Distributions of the nodule diameter. (a) Distributions in the DSB and LUNA datasets. (b) Distributions of the maximum nodule diameter for cancer patient and healthy people in the DSB dataset.

图2：结节直径的分布。(a) DSB和LUNA数据集中的分布。(b) DSB数据集中癌症患者和健康人群的最大结节直径分布。

III. DATASETS AND PREPROCESSING

III. 数据集与预处理

A. Datasets

A. 数据集

Two lung scans datasets are used to train the model, the LUng Nodule Analysis 2016 dataset (abbreviated as LUNA) and the training set of Data Science Bowl 2017 (abbreviated as DSB). The LUNA dataset includes 1186 nodule labels in 888 patients annotated by radiologists, while the DSB dataset only includes the per-subject binary labels indicating whether this subject was diagnosed with lung cancer in the year after the scanning. The DSB dataset includes 1397, 198, 506 persons (cases) in its training, validation, and test set respectively. We manually labeled 754 nodules in the training set and 78 nodules in the validation set.

使用两个肺部扫描数据集来训练模型，分别是LUng Nodule Analysis 2016数据集（简称LUNA）和Data Science Bowl 2017的训练集（简称DSB）。LUNA数据集包含888名患者中的1186个结节标签，这些标签由放射科医生标注，而DSB数据集仅包含每个受试者的二进制标签，指示该受试者在扫描后一年内是否被诊断为肺癌。DSB数据集在其训练、验证和测试集中分别包含1397、198和506人（病例）。我们在训练集中手动标注了754个结节，在验证集中标注了78个结节。

There are some significant differences between LUNA nodules and DSB nodules. The LUNA dataset has many very small annotated nodules, which may be irrelevant to cancer. According to doctors’ experience [36], the nodules smaller than $6\mathrm{\;{mm}}$ are usually not dangerous. However,the DSB dataset has many very big nodules (larger than ${40}\mathrm{\;{mm}}$ ) (the fifth sample in Fig. 1). The average nodule diameter is 13.68 $\mathrm{{mm}}$ in the DSB dataset and ${8.31}\mathrm{\;{mm}}$ in the LUNA dataset (Fig. 2a). In addition, the DSB dataset has many nodules on the main bronchus (third sample in Fig. 1), which are rarely found in the LUNA dataset. If the network is trained on the LUNA dataset only, it will be difficult to detect the nodules in the DSB dataset. Missing big nodules would lead to incorrect cancer predictions as the existence of big nodules is a hallmark of cancer patients (Fig. 2b). To cope with these problems,we remove the nodules smaller than $6\mathrm{\;{mm}}$ from LUNA annotations and manually labeled the nodules in DSB.

LUNA结节与DSB结节之间存在一些显著差异。LUNA数据集中有许多非常小的注释结节，这些结节可能与癌症无关。根据医生的经验[36]，直径小于 $6\mathrm{\;{mm}}$ 的结节通常不危险。然而，DSB数据集中有许多非常大的结节（大于 ${40}\mathrm{\;{mm}}$ ）（图1中的第五个样本）。DSB数据集中结节的平均直径为13.68 $\mathrm{{mm}}$ ，而LUNA数据集中为 ${8.31}\mathrm{\;{mm}}$ （图2a）。此外，DSB数据集中有许多位于主支气管上的结节（图1中的第三个样本），这在LUNA数据集中很少见。如果网络仅在LUNA数据集上进行训练，将难以检测DSB数据集中的结节。遗漏大结节会导致癌症预测错误，因为大结节的存在是癌症患者的一个标志（图2b）。为了应对这些问题，我们从LUNA注释中移除了直径小于 $6\mathrm{\;{mm}}$ 的结节，并对DSB中的结节进行了手动标注。

The authors have no professional knowledge of lung cancer diagnosis, so the nodule selection and manual annotations may raise considerable noise. The model in the next stage (cancer classification) is designed to be robust to wrong detections, which alleviates the demand for highly reliable nodule labels.

作者没有肺癌诊断的专业知识，因此结节选择和手动注释可能会引入相当大的噪声。下一阶段（癌症分类）的模型设计为对错误检测具有鲁棒性，从而减轻了对高度可靠结节标签的需求。

B. Preprocessing

B. 预处理

The overall preprocessing procedure is illustrated in Fig. 3. All raw data are firstly converted into Hounsfield Unit (HU), which is a standard quantitative scale for describing radiodensity. Every tissue has its own specific HU range, and this range is the same for different people (Fig. 3a).

整体预处理过程如图3所示。所有原始数据首先被转换为Hounsfield单位（HU），这是一种描述放射密度的标准定量尺度。每种组织都有其特定的HU范围，并且这一范围对不同的人是相同的（图3a）。

Mask extraction: A CT image contains not only the lung but also other tissues, and some of them may have spherical shapes and look like nodules. To rule out those distractors, the most convenient method is extracting the mask of lung and ignore all other tissues in the detection stage. For each slice, the 2D image is filtered with a Gaussian filter (standard deviation $= 1$ pixel) and then binarized using -600 as the threshold (Fig. 3b). All 2D connected components smaller than ${30}{\mathrm{\;{mm}}}^{2}$ or having eccentricity greater than 0.99 (which correspond to some high-luminance radial imaging noise) are removed. Then all 3D connected components in the resulting binary 3D matrix are calculated, and only those not touching the matrix corner and having a volume between ${0.68}\mathrm{\;L}$ and 7.5 L are kept.
掩膜提取：CT图像不仅包含肺部，还包含其他组织，其中一些可能具有球形形状并看起来像结节。为了排除这些干扰因素，最方便的方法是提取肺部掩膜并在检测阶段忽略所有其他组织。对于每个切片，2D图像通过高斯滤波器（标准差 $= 1$ 像素）进行滤波，然后使用-600作为阈值进行二值化（图3b）。所有小于 ${30}{\mathrm{\;{mm}}}^{2}$ 或偏心率大于0.99的2D连通分量（对应于一些高亮度径向成像噪声）被移除。然后计算结果二值3D矩阵中的所有3D连通分量，仅保留那些不接触矩阵角落且体积在 ${0.68}\mathrm{\;L}$ 和7.5 L之间的分量。

After this step, usually there is only one binary component left corresponding to the lung, but sometimes there are also some distracting components. Compared with those distracting components, the lung component is always at the center position of the image. For each slice of a component, we calculate the minimum distance from it to the image center (MinDist) and its area. Then we select all slices whose area ${6000}{\mathrm{\;{mm}}}^{2}$ in the component,and calculate the average MinDist of these slices. If the average MinDist is greater than ${62}\mathrm{\;{mm}}$ ,this component is removed. The remaining components are then unioned, representing the lung mask (Fig. 3c).

在此步骤之后，通常只剩下一个二值分量对应于肺部，但有时也会有一些干扰分量。与这些干扰分量相比，肺部分量始终位于图像的中心位置。对于分量的每个切片，我们计算其到图像中心的最小距离（MinDist）及其面积。然后我们选择分量中面积 ${6000}{\mathrm{\;{mm}}}^{2}$ 的所有切片，并计算这些切片的平均MinDist。如果平均MinDist大于 ${62}\mathrm{\;{mm}}$ ，则移除此分量。剩余的分量然后进行联合，表示肺部掩膜（图3c）。

The lung in some cases is connected to the outer world on the top slices, which makes the procedure described above fail to separate the lung from the outer world space. Therefore these slices need to be removed first to make the above processing work.

在某些情况下，肺部在顶部切片上与外界相连，这使得上述过程无法将肺部与外界空间分离。因此，这些切片需要首先被移除，以使上述处理能够正常工作。

Convex hull & dilation: There are some nodules attached to the outer wall of the lung. They are not included in the mask obtained in the previous step, which is unwanted. To keep them inside the mask, a convenient way is to compute the convex hull of the mask. Yet directly computing the convex hull of the mask would include too many unrelated tissues (like the heart and spine). So the lung mask is first separated into two parts (approximately corresponding to the left and right lungs) before the convex hull computation using the following approach.
凸包与膨胀：有一些结节附着在肺的外壁上。它们不包括在前一步骤中获得的掩膜中，这是不希望的。为了将它们保留在掩膜内，一种方便的方法是计算掩膜的凸包。然而，直接计算掩膜的凸包会包含太多不相关的组织（如心脏和脊柱）。因此，在计算凸包之前，首先将肺掩膜分成两部分（大致对应于左肺和右肺），使用以下方法。

Fig. 3: The procedures of preprocessing. Notice the nodule sticking to the outer wall of lungs. (a) Convert the image to HU, (b) binarize image by thresholding, © select the connected domain corresponding to the lungs, (d) segment the left and right lungs, (e) compute the convex hull of each lung. (f) dilate and combine the two masks, (g) multiply the image with the mask, fill the masked region with tissue luminance, and convert the image to UINT8, (h) crop the image and clip the luminance of bone.

图3：预处理过程。注意附着在肺外壁上的结节。(a) 将图像转换为HU，(b) 通过阈值化对图像进行二值化，© 选择对应于肺的连通域，(d) 分割左肺和右肺，(e) 计算每个肺的凸包。(f) 膨胀并合并两个掩膜，(g) 将图像与掩膜相乘，用组织亮度填充掩膜区域，并将图像转换为UINT8，(h) 裁剪图像并剪辑骨骼的亮度。

Fig. 4: The same as Fig. 3, but a lower slice is shown. Notice that no convex hull is calculated in step (e).

图4：与图3相同，但显示的是较低的切片。注意，在步骤(e)中没有计算凸包。

The mask is eroded iteratively until it is broken into two components (their volumes would be similar), which are the central parts of the left and right lungs. Then the two components are dilated back to original sizes. Their intersections with the raw mask are now masks for the two lungs separately (Fig. 3d). For each mask, most 2D slices are replaced with their convex hulls to include those nodules mentioned above (Fig. 3e). The resultant masks are further dilated by 10 voxels to include some surrounding space. A full mask is obtained by unioning the masks for the two lungs (Fig. 3f).

掩膜被迭代腐蚀，直到它被分成两个部分（它们的体积相似），即左右肺的中心部分。然后，这两个部分被膨胀回原始大小。它们与原始掩膜的交集现在分别是左右肺的掩膜（图3d）。对于每个掩膜，大多数2D切片被替换为其凸包，以包括上述结节（图3e）。生成的掩膜进一步膨胀10个体素，以包括一些周围空间。通过联合左右肺的掩膜获得完整掩膜（图3f）。

However, some 2D slices of the lower part of the lung have crescent shapes (Fig. 4). Their convex hulls may contain too many unwanted tissues. So if the area of the convex hull of a 2D mask is larger than 1.5 times that of the mask itself, the original mask is kept (Fig. 4e).

然而，肺下部的一些2D切片具有新月形（图4）。它们的凸包可能包含太多不需要的组织。因此，如果2D掩膜的凸包面积大于掩膜本身面积的1.5倍，则保留原始掩膜（图4e）。

Intensity normalization: To prepare the data for deep networks, we transform the image from HU to UINT8. The raw data matrix is first clipped within $\left\lbrack {-{1200},{600}}\right\rbrack$ ,and linearly transformed to $\left\lbrack {0,{255}}\right\rbrack$ . It is then multiplied by the
强度归一化：为了准备数据用于深度网络，我们将图像从HU转换为UINT8。原始数据矩阵首先在 $\left\lbrack {-{1200},{600}}\right\rbrack$ 范围内裁剪，并线性变换到 $\left\lbrack {0,{255}}\right\rbrack$ 。然后乘以

—— 更多内容请到Doc2X翻译查看——
—— For more content, please visit Doc2X for translations ——

标签

#人工智能 #计算机视觉