Elegant way to quickly load only a small subset of data in detectron2

Question

I'm looking for an elegant way to load only a small subset of data in detectron2 in order to speed up the training startup for debugging purposes.

I'm building my own instance segmentation model with detectron2 and running it the usual way:

train_net.py --config-file our_training_config.yaml

But it takes several minutes to load everything...

...
[01/25 13:11:48 d2.data.datasets.coco]: Loading datasets/coco/annotations/instances_train2017.json takes 20.74 seconds.
[01/25 13:11:50 d2.data.datasets.coco]: Loaded 118287 images in COCO format from datasets/coco/annotations/instances_train2017.json
...

I was wondering if there is a parameter/trick/flag which allows one to load only a small subset of examples (say, 100) only to quickly see if all the forward and backward calls works. Now it is a bit annoying during the debugging process, since each bug and fix requires another slow run to test if everything works.

Technically one can just cut instances_train2017.json in size, but I believe that there are some less nasty solutions to this problems.

Did you try `data.Subset` https://pytorch.org/docs/stable/data.html#torch.utils.data.Subset? — Umang Gupta, Jan 25 '22 at 16:44
@UmangGupta that would do the trick, but how to enforce `data.Subset` usage in detectron yaml configs? — Dominik Filipiak, Jan 26 '22 at 09:40

score 1 · Accepted Answer · answered Sep 10 '22 at 11:53

I had the same problem, then I found RandomSubsetTrainingSampler which comes builtin with detectron2. It allows loading a small fraction of the dataset for training. You can change the config file like that:

DATALOADER:
    SAMPLER_TRAIN: "RandomSubsetTrainingSampler"
    RANDOM_SUBSET_RATIO: 0.1

or you can simply pass a sampler to train loader:

subset_sampler = RandomSubsetTrainingSampler(len(dataset)), 0.1)
build_detection_train_loader(cfg, sampler=subset_sampler)

RANDOM_SUBSET_RATIO is between 0 and 1 so 0.1 means 10% of the training dataset. You can see how it is enabled by default in _train_loader_from_config when building a train loader.

However, it seems that currently there is no such nice way of loading a small part of the validation data using the config file. You can similarly pass a subset sampler to the build_detection_test_loader.

Elegant way to quickly load only a small subset of data in detectron2

1 Answers1