MMdetection v2.4.0之（支持的数据集目录结构、转化标准数据集和定制自定义数据集）

原创

已于 2024-07-17 14:29:25 修改 · 2.9k 阅读

于 2022-05-13 14:39:25 首次发布

本文介绍了MMDetection框架支持的多个公共数据集如COCO、PascalVOC、CityScapes，并指导如何处理额外数据集如COCO-stuff、PanopticFPN。讲解了如何将新数据集转换为现有格式、自定义数据集的配置调整和城市街景数据的处理。

MMDetection 支持数据集

MMDetection 支持多个公共数据集，包括 COCO、Pascal VOC、CityScapes等。
在这里插入图片描述
mmdetection
├── mmdet
├── tools
├── configs
├── data
│ ├── coco
│ │ ├── annotations
│ │ ├── train2017
│ │ ├── val2017
│ │ ├── test2017
│ ├── cityscapes
│ │ ├── annotations
│ │ ├── leftImg8bit
│ │ │ ├── train
│ │ │ ├── val
│ │ ├── gtFine
│ │ │ ├── train
│ │ │ ├── val
│ ├── VOCdevkit
│ │ ├── VOC2007
│ │ ├── VOC2012

有些模型需要额外的COCO-stuff数据集，比如HTC、DetectoRS和SCNet，你可以下载，解压，然后移动到coco文件夹。目录应该是这样的。
mmdetection
├── data
│ ├── coco
│ │ ├── annotations
│ │ ├── train2017
│ │ ├── val2017
│ │ ├── test2017
│ │ ├── stuffthingmaps
像PanopticFPN这样的Panoptic segmentation模型需要额外的COCO Panoptic数据集，你可以下载，解压，然后移动到coco annotation文件夹。目录应该是这样的。
mmdetection
├── data
│ ├── coco
│ │ ├── annotations
│ │ │ ├── panoptic_train2017.json
│ │ │ ├── panoptic_train2017
│ │ │ ├── panoptic_val2017.json
│ │ │ ├── panoptic_val2017
│ │ ├── train2017
│ │ ├── val2017
│ │ ├── test2017

cityscapes注释需要使用以下方法转换为 coco格式tools/dataset_converters/cityscapes.py

pip install cityscapesscripts

python tools/dataset_converters/cityscapes.py \
    ./data/cityscapes \
    --nproc 8 \
    --out-dir ./data/cityscapes/annotations

转化标准数据集

要支持新的数据格式，可以将它们转换为现有格式(COCO格式或PASCAL格式)，也可以直接将它们转换为中间格式。您还可以选择离线转换(在通过脚本进行训练之前)或在线转换(实现一个新的数据集并在训练时进行转换)。在MMDetection中，我们建议将数据转换为coco格式并离线进行转换，这样你只需要在数据转换后修改配置的数据注释路径和类。

1、将新的数据格式重新组织为现有格式

最简单的方法是将数据集转换为现有的数据集格式(COCO或PASCAL VOC)。
oco格式的注释json文件有以下必要的键:

'images': [
    {
   
   
        'file_name': 'COCO_val2014_000000001268.jpg',
        'height': 427,
        'width': 640,
        'id': 1268
    },
    ...
],

'annotations': [
    {
   
   
        'segmentation': [[192.81,
            247.09,
            ...
            219.03,
            249.06]],  # if you have mask labels
        'area': 1035.749,
        'iscrowd': 0,
        'image_id': 1268,
        'bbox': [192.81, 224.8, 74.73, 33.43],
        'category_id': 16,
        'id': 42986
    },
    ...
],

'categories': [
    {
   
   'id': 0, 'name': 'car'},
 ]

数据预处理完成后，用户可以通过两个步骤将定制的新数据集训练成现有格式(如coco格式)。
在这里，我们给出了一个示例来展示上述两个步骤，它使用一个CoCO格式的5个类的定制数据集来训练一个现有的Cascade Mask R-CNN R50-FPN检测器。

修改配置文件以使用定制的数据集。
2、查看自定义数据集的注释。

1、修改配置文件以使用自定义数据集

配置文件的修改涉及两个方面:

数据字段。具体来说，您需要显式地在数据中添加类字段在data.train, data.val and data.test。
模型部分中的num_classes字段。显式重写所有num_classes的默认值(例如COCO中的80)到你的类号。


# the new config inherits the base configs to highlight the necessary modification
_base_ = './cascade_mask_rcnn_r50_fpn_1x_coco.py'

# 1. dataset settings
dataset_type = 'CocoDataset'
classes = ('a', 'b', 'c', 'd', 'e') #修改为自定义的数据类别名称
data = dict(
    samples_per_gpu=2,
    workers_per_gpu=2,
    train=dict(
        type=dataset_type,
        # explicitly add your class names to the field `classes`
        classes=classes,
        ann_file='path/to/your/train/annotation_data', #修改为自己的数据路径
        img_prefix='path/to/your/train/image_data'),
    val=dict(
        type=dataset_type,
        # explicitly add your class names to the field `classes`
        classes=classes,
        ann_file='path/to/your/val/annotation_data',
        img_prefix='path/to/your/val/image_data'),
    test=dict(
        type=dataset_type,
        # explicitly add your class names to the field `classes`
        classes=classes,
        ann_file='path/to/your/test/annotation_data',
        img_prefix='path/to/your/test/image_data'))

# 2. model settings

# explicitly over-write all the `num_classes` field from default 80 to 5.
model = dict(
    roi_head=dict(
        bbox_head=[
            dict(
                type='Shared2FCBBoxHead',
                # explicitly over-write all the `num_classes` field from default 80 to 5.
                num_classes=5),
            dict(
                type='Shared2FCBBoxHead',
                # explicitly over-write all the `num_classes` field from default 80 to 5.
                num_classes=5),
            dict(
                type='Shared2FCBBoxHead',
                # explicitly over-write all the `num_classes` field from default 80 to 5.
                num_classes=5)],
    # explicitly over-write all the `num_classes` field from default 80 to 5.
    mask_head=dict(num_classes=5)))

2. 查看自定义数据集的注释

假设你定制的数据集是CoCO格式，确保你在定制的数据集中有正确的注释:

注释中的categories字段的长度应该完全等于配置中类字段的元组长度，即类的数量(例如本例中为5)。
配置文件中的classes字段应该与注释类别中的名称具有完全相同的元素和顺序。MMDetection自动将类别中的不连续id映射到连续的标签索引中，因此类别字段中名称的字符串顺序影响标签索引的顺序。同时，类在config中的字符串顺序会影响到预测边界框可视化过程中的标签文本。
3.annotation字段中的category_id应该是有效的，即category_id中的所有值都应该属于categories中的id。


'annotations': [
    {
   
   
        'segmentation': [[192.81,
            247.09,
            ...
            219.03,
            249.06]],  # if you have mask labels
        'area': 1035.749,
        'iscrowd': 0,
        'image_id': 1268,
        'bbox': [192.81, 224.8, 74.73, 33.43],
        'category_id': 16,
        'id': 42986
    },
    ...
],

# MMDetection automatically maps the uncontinuous `id` to the continuous label indices.
'categories': [
    {
   
   'id': 1, 'name': 'a'}, {
   
   'id': 3, 'name': 'b'}, {
   
   'id': 4, 'name': 'c'}, {
   
   'id': 16, 'name': 'd'}, {
   
   'id': 17, 'name': 'e'},
 ]

3、cityscapes to coco

对复杂城市街景的视觉理解是广泛应用的有利因素。对象检测极大地受益于大规模数据集，特别是在深度学习的背景下。然而，对于语义城市场景的理解，目前还没有足够的数据集来捕捉现实世界城市场景的复杂性。为了解决这个问题，我们引入了cityscape、一个基准套件和大规模数据集来训练和测试像素级和实例级语义标记的方法。《城市景观》由一组大型的、多样化的立体视频序列组成，这些视频序列记录在50个不同城市的街道上。其中5000幅图像具有高质量的像素级注释;另外有20000张图像具有粗注释，以支持利用大量弱标记数据的方法。至关重要的是，我们的努力在数据集的大小、注释的丰富性、场景的可变性和复杂性方面超过了以前的尝试。我们附带的实证研究提供了对数据集特征的深入分析，以及基于我们的基准的几种最先进方法的性能评估。
cityscapes to coco code

# Copyright (c) OpenMMLab. All rights reserved.
import argparse
import glob
import os.path as osp

import cityscapesscripts.helpers.labels as CSLabels
import mmcv
import numpy as np
import pycocotools.mask as maskUtils


def collect_files(img_dir, gt_dir):
    suffix = 'leftImg8bit.png'
    files = []
    for img_file in glob.glob(osp.join(img_dir, '**/*.png')):
        assert img_file.endswith(suffix), img_file
        inst_file = gt_dir + img_file[
            len(img_dir):-len(suffix)] + 'gtFine_instanceIds.png'
        # Note that labelIds are not converted to trainId for seg map
        segm_file = gt_dir + img_file[
            len(img_dir):-len(suffix)] + 'gtFine_labelIds.png'
        files.append((img_file, inst_file, segm_file))
    assert len(files), f'No images found in {img_dir}'
    print(f'Loaded {len(files)} images from {img_dir}')

    return files


def collect_annotations(files, nproc=1):
    print('Loading annotation images')
    if nproc > 1:
        images = mmcv.track_parallel_progress(
            load_img_info, files, nproc=nproc)
    else:
        images = mmcv.track_progress(load_img_info, files)

    return images


def load_img_info(files):
    img_file, inst_file, segm_file = files
    inst_img = mmcv.imread(inst_file, 'unchanged')
    # ids < 24 are stuff labels (filtering them first is about 5% faster)
    unique_inst_ids = np.unique(inst_img[inst_img >= 24])
    anno_info = []
    for inst_id in unique_inst_ids:
        # For non-crowd annotations, inst_id // 1000 is the label_id
        # Crowd annotations have <1000 instance ids
        label_id = inst_id // 1000 if inst_id >= 1000 else inst_id
        label = CSLabels.id2label[label_id]
        if not label.hasInstances or label.ignoreInEval:
            continue

        category_id = label.id
        iscrowd = int(inst_id < 1000)
        mask = np.asarray(inst_img == inst_id, dtype=np.uint8, order='F')
        mask_rle = maskUtils.encode(mask[:, :, None])[0]

        area = maskUtils.area(mask_rle)
        # convert to COCO style XYWH format
        bbox = maskUtils.toBbox(mask_rle)

        # for json encoding
        mask_rle['counts'] = mask_rle['counts'].decode()

        anno = dict(
            iscrowd=iscrowd,
            category_id=category_id,
            bbox=bbox.tolist(),
            area=area.tolist(),
            segmentation=mask_rle)
        anno_info.append(anno)
    video_name = osp.basename(osp.dirname(img_file))
    img_info = dict(
        # remove img_prefix for filename
        file_name=osp.join(video_name, osp.basename(img_file)),
        height=inst_img.shape[0],
        width=inst_img.shape[1],
        anno_info=anno_info,
        segm_file=osp.join(video_name, osp.basename(segm_file)))

    return img_info


def cvt_annotations(image_infos, out_json_name):
    out_json = dict()
    img_id = 0
    ann_id = 0
    out_json['images'] = []
    out_json['categories'] = []
    out_json['annotations'] = []
    for image_info