Ground-truth data collection and evaluation for computer vision

Question

Currently I am starting to develop a computer vision application that involves tracking of humans. I want to build ground-truth metadata for videos that will be recorded in this project. The metadata will probably need to be hand labeled and will mainly consist of location of the humans in the image. I would like to use the metadata to evaluate the performance of my algorithms.

I could of course build a labeling tool using, e.g. qt and/or opencv, but I was wondering if perhaps there was some kind of defacto standard for this. I came across Viper but it seems dead and it doesn't quite work as easy as I would have hoped. Other than that, I haven't found much.

Does anybody here have some recommendations as to which software / standard / method to use both for the labeling as well as the evaluation? My main preference is to go for something c++ oriented, but this is not a hard constraint.

Kind regards and thanks in advance! Tom

I am also interested in creating some ground-truth data and kind of resigned myself for just making a basic program myself. Have you had any more luck finding an existing labeling application? I have the feeling that there really should be some around... — Chris, May 31 '12 at 08:09
No, unfortunately I have not. I'm still interested though. I don't mind making some ad-hoc piece of software but I think it would be more useful if there was something more standard. Have you found anything yet? — Goosebumps, Jun 12 '12 at 11:42

Goosebumps · Answer 1 · 2012-06-14T14:50:55.717

I've had another look at vatic and got it to work. It is an online video annotation tool meant for crowd sourcing via a commercial service and it runs on Linux. However, there is also an offline mode. In this mode the service used for the exploitation of this software is not required and the software runs stand alone.

The installation is quite elaborately described in the enclosed README file. It involves, amongst others, setting up an appache and a mysql server, some python packages, ffmpeg. It is not that difficult if you follow the README. (I mentioned that I had some issues with my proxy but this was not related to this software package).

You can try the online demo. The default output is like this:

0 302 113 319 183 0 1 0 0 "person"
0 300 112 318 182 1 1 0 1 "person"
0 298 111 318 182 2 1 0 1 "person"
0 296 110 318 181 3 1 0 1 "person"
0 294 110 318 181 4 1 0 1 "person"
0 292 109 318 180 5 1 0 1 "person"
0 290 108 318 180 6 1 0 1 "person"
0 288 108 318 179 7 1 0 1 "person"
0 286 107 317 179 8 1 0 1 "person"
0 284 106 317 178 9 1 0 1 "person"

Each line contains 10+ columns, separated by spaces. The definition of these columns are:

1   Track ID. All rows with the same ID belong to the same path.
2   xmin. The top left x-coordinate of the bounding box.
3   ymin. The top left y-coordinate of the bounding box.
4   xmax. The bottom right x-coordinate of the bounding box.
5   ymax. The bottom right y-coordinate of the bounding box.
6   frame. The frame that this annotation represents.
7   lost. If 1, the annotation is outside of the view screen.
8   occluded. If 1, the annotation is occluded.
9   generated. If 1, the annotation was automatically interpolated.
10  label. The label for this annotation, enclosed in quotation marks.
11+ attributes. Each column after this is an attribute.

But can also provide output in xml, json, pickle, labelme and pascal voc

So, all in all, this does quite what I wanted and it is also rather easy to use. I am still interested in other options though!

Hey, I'm the author of VATIC. It's great to hear that you found it useful --- if you run into trouble, feel free to shoot me a message or ask around on here. I'm always happy to help! — carl, Jun 19 '12 at 04:49

score 3 · Answer 2 · answered Jun 14 '12 at 15:07

3

LabelMe is another open annotation tool. I think it is less suitable for my particular case but still worth mentioning. It seems to be oriented at blob labeling.

answered Jun 14 '12 at 15:07

Goosebumps

919
2
14
27

How does LabelMe compare with vatic? It appears to allow the user to specify the bounding shape, rather than use rectangles as in vatic. Is this the main difference, or are there other points? What makes it less suitable? I'm still in the process of installing vatic so haven't tried either yet, but will add LabelMe to my list. – Chris Jun 15 '12 at 11:37
2

Hey, I'm the author of VATIC and also work closely with the LabelMe folks. LabelMe itself is designed for images, but there'also a video version of LabelMe. The primary difference between VATIC and LabelMe is that LabelMe supports polygon annotations and doesn't have a Mechanical Turk infrastructure. However, I've found in user studies that labeling polygons is more time consuming that labeling bounding boxes. In any case, if your run into trouble with either, shoot me an email, and I'm happy to answer questions / put you in touch with the right people. – carl Jun 19 '12 at 04:55
Actually, I do have a question. Since the title of the question is GT data collection and evaluation, how would you suggest to evaluate the bounding boxes? I came across the [Jaccard index](http://en.wikipedia.org/wiki/Jaccard_index) which looks suitable. (maybe I should open a seperate thread for this...) – Goosebumps Jun 21 '12 at 11:23
3

@Goosebumps: If you are evaluating a tracking algorithms, then common metrics are time-to-failure (how many frames before the tracker loses the object), percent of boxes it gets correct, or a precision-recall curve. To determine whether an predicted box matches the ground truth, computer vision researchers typically use 50% overlap, which is basically the Jaccard index: if the Jaccard index between the prediction and ground truth is 0.5 or greater, then the prediction is correct, otherwise wrong. – carl Jul 11 '12 at 04:13
Thank you Carl, that is useful info. I did not know the convention for percentage overlap yet. I will also consider the other metrics you mention. – Goosebumps Jul 19 '12 at 06:28

score 2 · Answer 3 · answered May 31 '12 at 08:27

2

This is a problem that all practitioners of computer vision face. If you're serious about it, there's a company that does it for you by crowd-sourcing. I don't know whether I should put a link to it in this site, though.

answered May 31 '12 at 08:27

killogre

1,730
15
26

It is not that I need to have it done by someone. But crowd source an dcomputer vision annotation made me find this: [link](http://mit.edu/vondrick/vatic/). Which, on first glance, looks like something useable. – Goosebumps Jun 12 '12 at 11:47
That does look quite useful, will have to investigate further. – Chris Jun 12 '12 at 12:55
I had a go at it, but I haven't been successful yet due to some proxy issues. I would be interested to know if you succeeded and, if you were, if it is possible to use this software without the crowd sourcing part. – Goosebumps Jun 12 '12 at 14:13
Thank you, I'm not eligible to up-vote your answer yet. I will do so when I can. – Goosebumps Jun 14 '12 at 14:27

score 1 · Answer 4 · answered Aug 24 '16 at 22:55

I've had the same problem looking for a tool to use for image annotation to build a ground truth data set for training models for image analysis.

LabelMe is a solid option if you need polygonal outlining for your annotation. I've worked with it before and it does the job well and has some additional cool features when it comes to 3d feature extraction. In addition to LabelMe, I also made an open source tool called LabelD. If you're still looking for a tool to do your annotation, check it out!

Ground-truth data collection and evaluation for computer vision

4 Answers4