I am developing a Ruby on Rails application where I want to detect the number of physical objects (bottles and food packets) in an image.
I just explored Google Vision API (https://cloud.google.com/vision/) to check whether this is possible or not. I uploaded a photo which has some cool drink bottles and got the below response.
{
"responses" : [
{
"labelAnnotations" : [
{
"mid" : "\/m\/01jwgf",
"score" : 0.77698487,
"description" : "product"
},
{
"mid" : "\/m\/0271t",
"score" : 0.72027034,
"description" : "drink"
},
{
"mid" : "\/m\/02jnhm",
"score" : 0.51373237,
"description" : "tin can"
}
]
}
]
}
My concern here is, it is not giving the number of cool drink bottles available in the image, rather it returning type of objects available in the photo.
Is this possible in Google Vision API or any other solution available for this?
Any help would be much appreciated.