I am trying to follow the Hugging Face DETR Tutorial for fine-tuning in my own dataset. Here they explain that some data augmentation techniques are applied.
Note regarding data augmentation
DETR actually uses several image augmentations during training. One of them is scale augmentation: they set the min_size randomly to be one of [480, 512, 544, 576, 608, 640, 672, 704, 736, 768, 800] as can be seen here.
However, we are not going to add any of the augmentations that are used in the original implementation during training. It works fine without them
However, I want to add more, such as zoom, hsv variations, etc. The dataset class definition is as follows:
import torchvision
import os
class CocoDetection(torchvision.datasets.CocoDetection):
def __init__(self, img_folder, processor, train=True):
ann_file = os.path.join(img_folder, "custom_train.json" if train else "custom_val.json")
super(CocoDetection, self).__init__(img_folder, ann_file)
self.processor = processor
def __getitem__(self, idx):
# read in PIL image and target in COCO
img, target = super(CocoDetection, self).__getitem__(idx)
# preprocess image and target (converting target to DETR format, resizing + normalization of both image and target)
image_id = self.ids[idx]
target = {'image_id': image_id, 'annotations': target}
encoding = self.processor(images=img, annotations=target, return_tensors="pt")
pixel_values = encoding["pixel_values"].squeeze() # remove batch dimension
target = encoding["labels"][0] # remove batch dimension
return pixel_values, target
The preprocessor_config.json read by HuggingFace to load the model includes:
{
"do_normalize": true,
"do_resize": true,
"feature_extractor_type": "DetrFeatureExtractor",
"format": "coco_detection",
"image_mean": [
0.485,
0.456,
0.406
],
"image_std": [
0.229,
0.224,
0.225
],
"max_size": 1333,
"size": 800
}
Which already includes the mean and variation for augmentation. As a new user in HuggingFace, I do not know if repeating that in the typical torchvision.transforms.compose([])
would affect on anything, or should we discard adding it as it is already done here in the json file?
I didn't find anything similar on the Internet. It is probably a dummy question but I don't know how and where to add the augmentations. Could anyone make an example, given my situation?
Any help is appreciated.
Thank you so much