0

I am working with Google cloud vision API with Python
(https://googlecloudplatform.github.io/google-cloud-python/stable/vision-usage.html)

But I could not understand why the annotation result of a single image consists of list of annotations.
The document says:

>>> from google.cloud import vision
>>> from google.cloud.vision.feature import Feature
>>> from google.cloud.vision.feature import FeatureTypes
>>> client = vision.Client()
>>> image = client.image(source_uri='gs://my-test-bucket/image.jpg')
>>> features = [Feature(FeatureTypes.FACE_DETECTION, 5),
...             Feature(FeatureTypes.LOGO_DETECTION, 3)]
>>> annotations = image.detect(features)
>>> len(annotations)
2
>>> for face in annotations[0].faces:
...     print(face.joy)
Likelihood.VERY_LIKELY
Likelihood.VERY_LIKELY
Likelihood.VERY_LIKELY
>>> for logo in annotations[0].logos:
...     print(logo.description)
'google'
'github'

Why image.detect returns multiple annotations for a single image?
It seems unnecessary because detection results are contained in each attributes (annotations[0].faces, annotations[0].logos, etc.).

And when I try the api with my own image it returns the annotations of length 1.

So my question is:

  • Why python's vision api client returns multiple annotations for a single image?
  • Do I need to parse every annotation in the list annotations?
keisuke
  • 2,123
  • 4
  • 20
  • 31

1 Answers1

0

The Google Cloud Vision API currently provides 10 different annotations you can apply to any image. For instance, among the 10 available annotations, you can detect:

  • If any recognizable logos exists anywhere within the image.
  • You can detect if any faces exist in the image and return details about each face.
  • You can read more about all the available annotations starting here.

So, to answer your questions:

  • The vision API will return any annotations you request it to return. If you ask for multiple annotations to be returned, it will return multiple. If you only request 1 out of the 10 annotations, it will return just the one. The example Python code you quoted from in your post requests two annotations be returned: FACE_DETECTION AND LOGO_DETECTION, so those two and only those two annotations will be returned.
  • You only need to parse the annotations you want to parse. Since each annotation has an expense to it (see this page for prices), I'd recommend you only request the annotations you want to see the results of, else it could get expensive.

Note when running the Python example you quoted in your post, len(annotations) will return "1" no matter how many annotations you choose to detect via your code.

Also note that any annotations you include that returns no values, for example no logos are found in the image, will return nothing.