1

I am trying to use COCO 2014 data for semantic segmentation training in PyTorch. I have a PSPNet model with a Cross Entropy loss function that worked perfectly on PASCAL VOC dataset from 2012. Now I am trying to use a portion of COCO pictures to do the same process. But Coco has json data instead of .png images for annotation and I somehow have to covert one to the other. I have noticed that there is annToMask in cocotools, but I cannot quiet figure out how to use that function in my case. This is kind of what my dataloader's pull item looks like

def pull_item(self, index):

        I DONT KNOW WHAT TO DO HERE

        raw_img = self.transform(raw_img)
        anns_img = self.transform(anns_img)

        return raw_img, anns_img

Below is what my training function that uses data from dataloaders looks like.

 for images, labels in dataloaders_dict[phase]:

                images = images.to(device)

                labels = torch.squeeze(labels)
                labels = labels.to(device)

                with torch.set_grad_enabled(phase == 'train'):
                    outputs = net(images)

                    loss = criterion(outputs, labels.long())
7029279
  • 485
  • 1
  • 6
  • 15

2 Answers2

2

I have worked on creating a Data Generator for the COCO dataset with PyCOCO and I think my experience can help you out. My post on medium documents the entire process from start to finish, including the creation of masks.

However, point to note, I was working with Tensorflow Keras and not pytorch. But the logic flow should largely be the same, so I am sure you can take back something useful from it.

Viraf
  • 121
  • 3
0

Thanks to the above answer I was able to create this:

class ImageData(Dataset):
    def __init__(
        self, 
        annotations: COCO, 
        img_ids: List[int], 
        cat_ids: List[int], 
        root_path: Path, 
        transform: Optional[Callable]=None
    ) -> None:
        super().__init__()
        self.annotations = annotations
        self.img_data = annotations.loadImgs(img_ids)
        self.cat_ids = cat_ids
        self.files = [str(root_path / img["file_name"]) for img in self.img_data]
        self.transform = transform
        
    def __len__(self) -> int:
        return len(self.files)
    
    def __getitem__(self, i: int) -> Tuple[torch.Tensor, torch.LongTensor]:
        ann_ids = self.annotations.getAnnIds(
            imgIds=self.img_data[i]['id'], 
            catIds=self.cat_ids, 
            iscrowd=None
        )
        anns = self.annotations.loadAnns(ann_ids)
        mask = torch.LongTensor(np.max(np.stack([self.annotations.annToMask(ann) * ann["category_id"] 
                                                 for ann in anns]), axis=0)).unsqueeze(0)
        
        img = io.read_image(self.files[i])
        if img.shape[0] == 1:
            img = torch.cat([img]*3)
        
        if self.transform is not None:
            return self.transform(img, mask)
        
        return img, mask

Full post can be found in this kaggle kernel.

sachinruk
  • 9,571
  • 12
  • 55
  • 86