Questions tagged [google-dataflow]
49 questions
0
votes
2 answers
Google dataflow job which reads from Pubsub and writes to GCS is very slow (WriteFiles/WriteShardedBundlesToTempFiles/GroupIntoShards) takes too long
Currently we have a dataflow job which reads from pubsub and writes avro file using FileIO.writeDynamic to GCS and when we test with say 10000 events/sec , not able to process faster as WriteFiles/WriteShardedBundlesToTempFiles/GroupIntoShards is…

user2313227
- 1
- 3
0
votes
1 answer
How to create tar.gz file using apache beam
I used the below to create a tar.gz file, and .gz file was created but tar file was not available. How to achieve the result?
PCollection lines = pipeline.apply("To read from file",
TextIO.read().from(

sathiya raj
- 35
- 1
- 5
-1
votes
1 answer
Google charging for Dataflow Job logs generated by the dataflow
Google is charging for dataflow job logs generated by the pipeline itself and we are not generating any logs from the job and we are running dataflow jobs via composer using the python-apache beam.
It doesn't make any sense as we are not generating…

Chethan R
- 11
- 4
-1
votes
1 answer
Google Dataflow pipeline for varying schema
I have a product to define and configure business workflows. A part of this product is a form-builder which enables users to setup different forms.
This entire forms data is backed on MongoDB in the following structure
- form_schemas
{
"_id" :…

Sharath Chandra
- 654
- 8
- 26