A preliminary note:
COCO datasets are primarily JSON files containing paths to images and annotations for those images. So, if you wish to split your dataset you don't need to move your images into separate folders, but you should split the records contained in the JSON file. Doing this from scratch is not straightforward as the records have internal dependencies in the JSON file. The good news is there is a package named COCOHelper that can help you do that with very little effort!
Quick Solution:
You can split COCO datasets into subsets associated with their own annotations using COCOHelper. It is as simple as:
ch = COCOHelper.load_json(annotations_file, img_dir=image_dir)
splitter = ProportionalDataSplitter(70, 10, 20) # split dataset as 70-10-20% of images
ch_train, ch_val, ch_test = splitter.apply(ch)
ch_train.write_annotations_file(fname)
A fully working example:
Imports + set up paths:
from pathlib import Path
from cocohelper import COCOHelper
from cocohelper.splitters.proportional import ProportionalDataSplitter
root_dir = Path('/data/robotics/oil_line_detection')
annotations_dir = root_dir / 'annotations'
annotations_file = annotations_dir / 'coco.json'
image_dir = ""
Create a cocohelper object, which represents your COCO dataset:
print(f"Loading dataset: {annotations_file}")
ch = COCOHelper.load_json(annotations_file, img_dir=image_dir)
Split the dataset (e.g. using a proportional data splitter, which splits data randomly):
splitter = ProportionalDataSplitter(70, 10, 20)
ch_train, ch_val, ch_test = splitter.apply(ch)
dest_dir = Path("./result") # where to save the JSON files with annotations on the subset of images
for ch, ch_name in zip([ch_train, ch_val, ch_test], ["train", "val", "test"]):
print(f"Saving dataset: '{ch_name}'")
fname = dest_dir / f"{ch_name}.json"
ch.write_annotations_file(fname)
More examples and details here.