0

I am having trouble creating a dataflowRunner job that connects a pub/sub source to a big query sink, by plugging these two:

apache_beam.io.gcp.pubsub.PubSubSource
apache_beam.io.gcp.bigquery.BigQuerySink

into lines 59 and 74 respectively in the beam/sdks/python/apache_beam/examples/streaming_wordcount.py (https://github.com/apache/beam/blob/master/sdks/python/apache_beam/examples/streaming_wordcount.py) example on github. After removing lines 61-70, and specifying the correct pub/sub and bigquery arguments, the script runs without errors without building the pipeline.

sidenote: the script mentions streaming pipeline support isnt available for use in Python. However, on the beam docs it mentions apache_beam.io.gcp.pubsub.PubSubSource is only available for streaming (1st sentence underneath the "apache_beam.io.gcp.pubsub module" heading: https://beam.apache.org/documentation/sdks/pydoc/2.0.0/apache_beam.io.gcp.html#module-apache_beam.io.gcp.pubsub)

Evan
  • 217
  • 2
  • 13

1 Answers1

4

You can't stream on Python Dataflow - for now.

Monitor this changelog to find out the day it does:

(soon!)

Felipe Hoffa
  • 54,922
  • 16
  • 151
  • 325
  • 1
    can't wait for when it does :)! it'll be an awesome feature – Willian Fuks Jun 29 '17 at 23:06
  • @FilipeHoffa, is it possible to batch process into big query in python? – Evan Jul 04 '17 at 17:10
  • @Evan, you certainly can batch process messages from Pub/Sub into BigQuery using Python; see example that Google makes available here - https://github.com/GoogleCloudPlatform/kubernetes-bigquery-python/blob/master/pubsub/pubsub-pipe-image/pubsub-to-bigquery.py – andre622 Jul 11 '17 at 14:19
  • Am I right in assuming that streaming still isn't supported with the python sdk? – jimmy Jul 17 '17 at 14:35
  • @FelipeHoffa can't wait it coming – Norio Akagi Aug 07 '17 at 03:08
  • 1
    You can contact Google to add your project to their Whitelist. And then you can implement streaming with python – Sam Aug 30 '17 at 05:03