Questions tagged [google-dataflow]

49 questions
0
votes
2 answers

Google dataflow job which reads from Pubsub and writes to GCS is very slow (WriteFiles/WriteShardedBundlesToTempFiles/GroupIntoShards) takes too long

Currently we have a dataflow job which reads from pubsub and writes avro file using FileIO.writeDynamic to GCS and when we test with say 10000 events/sec , not able to process faster as WriteFiles/WriteShardedBundlesToTempFiles/GroupIntoShards is…
0
votes
1 answer

How to create tar.gz file using apache beam

I used the below to create a tar.gz file, and .gz file was created but tar file was not available. How to achieve the result? PCollection lines = pipeline.apply("To read from file", TextIO.read().from(
sathiya raj
  • 35
  • 1
  • 5
-1
votes
1 answer

Google charging for Dataflow Job logs generated by the dataflow

Google is charging for dataflow job logs generated by the pipeline itself and we are not generating any logs from the job and we are running dataflow jobs via composer using the python-apache beam. It doesn't make any sense as we are not generating…
-1
votes
1 answer

Google Dataflow pipeline for varying schema

I have a product to define and configure business workflows. A part of this product is a form-builder which enables users to setup different forms. This entire forms data is backed on MongoDB in the following structure - form_schemas { "_id" :…
1 2 3
4