2

I managed to retrain the object detection module on my own dataset by adhering it to the PASCAL VOC format shown below.

This format is bounding box oriented and peeking into their TFRecords creation scripts, it does expect a good number of these groundtruth values to generate corresponding TFRecords.

The problem with bounding boxes, is that it gives you approximations and annotating rotated images can be rather challenging.

After looking around, I came across labelme which allows you to perform shape (point to point) annotations, as well instead of just bounding box. Below is a short version of the produced annotation along with the resulting image consisting of the resulting shape.

My questions are:

  1. Concentrating on the contents of <polygon></polygon>, does the Object Detection API support point to point annotations?

  2. If yes to 1, how do I go about creating the TFRecords for it? What other changes need to be made to accommodate this?

Pascal VOC Format

<annotation verified="no">
  <folder>VOC2012</folder>
  <filename>pic.jpg</filename>
  <source>
    <database>Unknown</database>
  </source>
  <size>
    <width>214</width>
    <height>300</height>
    <depth>3</depth>
  </size>
  <segmented>0</segmented>
  <object>
    <name>sample</name>
    <pose>Unspecified</pose>
    <truncated>0</truncated>
    <difficult>0</difficult>
    <bndbox>
      <xmin>32</xmin>
      <ymin>37</ymin>
      <xmax>180</xmax>
      <ymax>268</ymax>
    </bndbox>
  </object>
</annotation>

Snapshot of point-to-point annotation

Here's the full annotation file and the corresponding image

<annotation>
    <filename>ipad.jpg</filename>
    <folder>sample</folder>
    <source>
    <submittedBy>username</submittedBy>
    </source>
    <imagesize>
        <nrows>450</nrows>
        <ncols>800</ncols>
    </imagesize>
    <object>
        <name>ipad</name>
        <deleted>0</deleted><verified>0</verified><occluded>no</occluded>
        <attributes></attributes>
        <parts>
            <hasparts></hasparts>
            <ispartof></ispartof>
        </parts>
        <date>12-Jul-2017 19:20:22</date><id>0</id>
        <polygon>
            <username>anonymous</username>
            <pt><x>40</x><y>76</y></pt>
            <pt><x>435</x><y>11</y></pt>
            <pt><x>472</x><y>311</y></pt>
            <pt><x>94</x><y>418</y></pt>
        </polygon>
    </object>
    <object>
        <name>screen</name>
        <deleted>0</deleted>
        <verified>0</verified>
        <occluded>no</occluded>
        <attributes></attributes>
        <parts>
            <hasparts></hasparts>
            <ispartof></ispartof>
        </parts>
        <date>12-Jul-2017 19:20:48</date><id>1</id>
        <polygon>
            <username>anonymous</username>
            <pt><x>75</x><y>89</y></pt>
            <pt><x>118</x><y>397</y></pt>
            <pt><x>447</x><y>308</y></pt>
            <pt><x>421</x><y>30</y></pt>
        </polygon>
    </object>
</annotation>
eshirima
  • 3,837
  • 5
  • 37
  • 61

1 Answers1

2

The Tensorflow Object Detection API only performs Bounding Box annoations.

Derek Chow
  • 722
  • 3
  • 6
  • Thanks for the response. How would you recommend handling rotations then? And uh is point-to-point annotation training even feasible for object localization in ML? – eshirima Jul 12 '17 at 21:06
  • 1
    The current state only allows bounding boxes. When you have polygons or other shapes you must transfer them to boundings boxes before training. Facebook's [Multipathnet](https://github.com/facebookresearch/multipathnet) can be trained on polygons. – burny Jul 18 '17 at 13:44
  • How to convert polygons to bounding box format? Is there any known way to do it? Or maybe writing script to get the extreme corners of the polygon. – Deepank Verma Sep 28 '17 at 07:21
  • as Derek mentioned, the api use to only perform bounding box annotations, it now supports masks though using MASK RCNN, https://github.com/tensorflow/models/blob/master/research/object_detection/g3doc/instance_segmentation.md – dfresh22 Mar 13 '18 at 04:27
  • This answer is wrong. You can pass images without bboxes to tf od by passing an empty list to the tfrecord producer – denisb411 Jan 19 '21 at 21:25