7

I have an app that uploads photos regularly to a GCS bucket. When those photos are uploaded, I need to add thumbnails and do some analysis. How do I set up notifications for the bucket?

Brandon Yarbrough
  • 37,021
  • 23
  • 116
  • 145
  • i configured every thing yet my script waiting saying Listening for messages on projects/bold-proton-236611/subscriptions/projects/bold-proton-236611/subscriptions/subtestbucketthhh – syed irfan Jul 10 '19 at 05:52

3 Answers3

22

The way to do this is to create a Cloud Pub/Sub topic for new objects and to configure your GCS bucket to publish messages to that topic when new objects are created.

First, let's create a bucket PHOTOBUCKET:

$ gsutil mb gs://PHOTOBUCKET

Now, make sure you've activated the Cloud Pub/Sub API.

Next, let's create a Cloud Pub/Sub topic and wire it to our GCS bucket with gsutil:

$ gsutil notification create \
    -t uploadedphotos -f json \
    -e OBJECT_FINALIZE gs://PHOTOBUCKET

The -t specifies the Pub/Sub topic. If the topic doesn't already exist, gsutil will create it for you.

The -e specifies that you're only interested in OBJECT_FINALIZE messages (objects being created). Otherwise you'll get every kind of message in your topic.

The -f specifies that you want the payload of the messages to be the object metadata for the JSON API.

Note that this requires a recent version of gsutil, so be sure to update to the latest version of gcloud, or run gsutil update if you use a standalone gsutil.

Now we have notifications configured and pumping, but we'll want to see them. Let's create a Pub/Sub subscription:

$ gcloud beta pubsub subscriptions create processphotos --topic=uploadedphotos

Now we just need to read these messages. Here's a Python example of doing just that. Here are the relevant bits:

def poll_notifications(subscription_id):
    client = pubsub.Client()
    subscription = pubsub.subscription.Subscription(
        subscription_id, client=client)
    while True:
        pulled = subscription.pull(max_messages=100)
        for ack_id, message in pulled:
            print('Received message {0}:\n{1}'.format(
                message.message_id, summarize(message)))
            subscription.acknowledge([ack_id])

def summarize(message):
    # [START parse_message]
    data = message.data
    attributes = message.attributes

    event_type = attributes['eventType']
    bucket_id = attributes['bucketId']
    object_id = attributes['objectId']
    return "A user uploaded %s, we should do something here." % object_id

Here is some more reading on how this system works:

https://cloud.google.com/storage/docs/reporting-changes https://cloud.google.com/storage/docs/pubsub-notifications

Brandon Yarbrough
  • 37,021
  • 23
  • 116
  • 145
  • how can I authorize my subscriber client with service credentials ? there is literally no sample for that anywhere – Ace McCloud Dec 12 '17 at 16:33
  • 1
    You can grant a service account permission to poll from the subscription manually from the UI: https://cloud.google.com/pubsub/docs/access_control#console. – Brandon Yarbrough Dec 12 '17 at 19:22
  • okay . so if i deploy my subscriber client app in APP engine , all i have to do is make sure the app engine service account key has access to the subscription through the UI link above ? – Ace McCloud Dec 12 '17 at 20:34
  • 1
    The service account itself (it's got an email address) needs authorization through the UI, yes. Then, if you're using the Python client library in app engine, the library will handle the authentication work automatically. – Brandon Yarbrough Dec 12 '17 at 23:25
  • Hi Brandon , using the approach documented in your answer , what I have is a Listener script that is always running and continuously subscribed and listening to changes on a bucket. When some event occurs , it gets to know the file that changed and goes to work on it. – Ace McCloud Jan 03 '18 at 17:18
  • When I went to deploy this i.e have this listener continuously running on App engine or so , it looks like I have to create a python cron job with an end point and this script's logic goes into that end point. – Ace McCloud Jan 03 '18 at 17:19
  • as opposed to this , i can directly have GCS send a request to an app engine end point using https://cloud.google.com/storage/docs/object-change-notification . this seems much better than using pubsub in this case however pubsub is the recommended way to track changes to GCS buckets. – Ace McCloud Jan 03 '18 at 17:21
  • so , while pubsub maybe the recommended way , considering deployment in perspective it makes sense to use the object notifications getting sent to an app engine endpoint more than using pubsub – Ace McCloud Jan 03 '18 at 17:28
  • 1
    Cloud Pub/Sub subscriptions can be configured either for polling (as in my above example) or for pushing (which would work great for your app engine case). https://cloud.google.com/pubsub/docs/subscriber#push_pull – Brandon Yarbrough Jan 05 '18 at 01:40
  • exactly what i ended up doing :) – Ace McCloud Jan 05 '18 at 03:02
  • So from pubsub i push a notification to an app engine endpoint that tells me the name of file that was put into the bucket. The endpoint basically reads the contents of the file and extracts some information to process. I ran this endpoint locally and it is all great irrespective of file size . The moment I deploy to app engine and try to upload a file >32MB to the bucket , my endpoint is not able to read the file . It gives me an error : – Ace McCloud Jan 05 '18 at 22:47
  • DataCorruption: Checksum mismatch while downloading: https://www.googleapis.com/download/storage/v1/b//o/?alt=media The X-Goog-Hash header indicated an MD5 checksum of: 4yMpMREOntqejJqIQHyBNA== but the actual MD5 checksum of the downloaded contents was: XC4KVgtr82bKWHD7P5+JwA== – Ace McCloud Jan 05 '18 at 22:58
  • I am not sure if this is a bug or coincides with the 32MB file limit or is independent :( – Ace McCloud Jan 05 '18 at 22:59
  • 1
    There is a 32 MB limit, so reading files bigger than that from app engine won't work. – Brandon Yarbrough Jan 05 '18 at 23:54
  • agreed. but the error message doesn't say that.it says checksum error. and is there away to get around this ? blobstore ? or the inbuilt cloudstorage library in app engine as opposed to google.cloud.storage ? will these two circumvent the 32MB limit ? – Ace McCloud Jan 06 '18 at 21:46
4

GCP also offers an earlier version of the Pub/Sub cloud storage change notifications called Object Change Notification. This feature will directly POST to your desired endpoint(s) when an object in that bucket changes. Google recommends the Pub/Sub approach.

https://cloud.google.com/storage/docs/object-change-notification

Brandon Yarbrough
  • 37,021
  • 23
  • 116
  • 145
Aidan Hoolachan
  • 2,285
  • 1
  • 11
  • 14
  • Object Change Notifications are not in beta. They've been available for years. However, Cloud Pub/Sub support is superior in pretty much every way: it's cheaper, more powerful, and simpler to use. I would strongly recommend choosing either Cloud Pub/Sub notifications or Google Cloud Functions over Object Change Notifications. – Brandon Yarbrough Apr 04 '17 at 17:22
  • 1
    @BrandonYarbrough -- Yep, you're right, I updated my comment to be more accurate. I'm just working through these services and the naming conventions are quite confusing. I am, however, finding cloud pub/sub to be difficult to work with. I'm receiving a "CommandException: Invalid subcommand "create" for the notification command" when trying to configure cloud pub/sub using "gsutil notification create ..." – Aidan Hoolachan Apr 05 '17 at 05:07
  • The notification commands are quite new. To use them, you need gsutil version 4.24. If you use the gcloud SDK, please run "gcloud components update", or if you are using a gsutil, please run "gsutil update". – Brandon Yarbrough Apr 05 '17 at 16:39
0

while using this example! keep in mind two things 1) they have upgraded code to python 3.6 pub_v1 this might not be running on python 2.7 2) while calling poll_notifications(projectid,subscriptionname) pass your GCP project id : e.g bold-idad & subscrition name e.g asrtopic

syed irfan
  • 505
  • 4
  • 8