0

I am developing a Ruby on Rails application where I want to detect the number of physical objects (bottles and food packets) in an image.

I just explored Google Vision API (https://cloud.google.com/vision/) to check whether this is possible or not. I uploaded a photo which has some cool drink bottles and got the below response.

{
  "responses" : [
    {
      "labelAnnotations" : [
        {
          "mid" : "\/m\/01jwgf",
          "score" : 0.77698487,
          "description" : "product"
        },
        {
          "mid" : "\/m\/0271t",
          "score" : 0.72027034,
          "description" : "drink"
        },
        {
          "mid" : "\/m\/02jnhm",
          "score" : 0.51373237,
          "description" : "tin can"
        }
      ]
    }
  ]
}

My concern here is, it is not giving the number of cool drink bottles available in the image, rather it returning type of objects available in the photo.

Is this possible in Google Vision API or any other solution available for this?

Any help would be much appreciated.

Jayaprakash
  • 1,407
  • 1
  • 9
  • 19
  • I won't mark your question as duplicate, but as too broad or off-topic. Please read [ask]. break it down to smaller problems and be very specific about the restrictions. 2d or 3d? objects known or unknown? environment known or unknown? processing time, speed? online offline? as you ask it right now I would say: impossible in 2016 – Piglet Oct 14 '16 at 08:19
  • Thanks for your comments. I have tried to put all the informations I had. I thought people would understand my description. Anyway thanks. – Jayaprakash Oct 14 '16 at 08:41
  • Aside being too broad, software/libraries/etc recommendations are **explicitly off topic** – Miki Oct 14 '16 at 08:45

2 Answers2

1

I've made a simple command line program that detects faces and replaces them with emojis using OpenCV through JRuby. It's an absolute pain to set up, but once it is done it is a beauty to write in. I also made a small script to create OpenCV JRuby projects that can be executed with the required command line arguments in a shell script, which alleviates most, if not all, of the pain when setting up.

Later on when I'm at my computer I'll upload both the project and the script to GitHub and link them here if you want me to, but for now I can direct you to this project as an example.

EDIT

Here are the links to the JRuby OpenCV project and script:

JRuby OpenCV Project

Project Creation Script

Elijah Schutz
  • 123
  • 1
  • 6
1

This problem unfortunately is not a problem that is fully solved. You can go with some object detection algorithms like Faster RCNN and YOLO. They can give you the objects up to a bounding box if they are included in ImageNet dataset; however, of course you can train your own classifier with them. I recommend YOLO which is really easy to use and nicely documented.

Also, you can deploy a DIGITS object detection server which includes Faster RCNN. It gives you a really nice user interface to use those models.

cagatayodabasi
  • 762
  • 11
  • 34