0

I'm trying to web scrape all of the images from a web page, load the jpg files into Google Cloud Storage, and then run each image through the Vision API.

Any suggestions on how to do this in a scalable and automated way? Sounds like we could use a cloud function. Any recommendations and sample code to get started would be appreciated.

Thank you!

RE Wolfe
  • 29
  • 3
  • I would create a Cloud Function that triggers when a new image is uploaded to your Google Cloud Storage bucket and sends this image to the Vision API to analyze it using [this](https://cloud.google.com/vision/docs/libraries#using_the_client_library). Also I’ve found [this tutorial](https://towardsdatascience.com/google-vision-api-for-image-analysis-with-python-d3a2e45913d4) that can be helpful. Please note that these links will not help you in the web scraping part. – BittorH May 27 '21 at 13:28
  • Thank you! Any ideas on how to setup a pipeline to scrape the images and bring them into GCS in the first place? – RE Wolfe May 31 '21 at 22:38
  • You could try to do it with [Scrapy](https://docs.scrapy.org/en/latest/topics/media-pipeline.html). Also in [this](https://stackoverflow.com/questions/55768694/getting-spider-on-scrapy-cloud-to-store-files-on-google-cloud-storage-using-gcsf) post there is an example that could be useful. – Sergi Jun 03 '21 at 15:29

0 Answers0