Questions tagged [google-dataflow]
49 questions
0
votes
1 answer
How to put nested JSON data into BigQuery table with Google Cloud Platform's dataflow's Pub/Sub Topic -> BigQuery Template
I am trying to store messages sent from an IoT device in a BigQuery table.
The cloud architecture is as follows:
Local Device -> json_message -> mqtt_client -> GC IoT device -> Device Registry -> Pub/Sub Topic -> Dataflow with Pub/Sub Topic to…

dda2120
- 41
- 1
- 5
0
votes
1 answer
Query in Firebase realtime database
What will be the Indexing and query for this in Firebase real-time Database?
0
votes
0 answers
PubSub Unacked messages are not Showing up when using dataflow as subscriber
I have a dataflow pipeline which reads messages from a subscription. It is working fine when messages are coming in correct format. But when messages are not in proper format ,it's throwing error. I decided to use the dead letter topic when there is…

alrou
- 51
- 8
0
votes
1 answer
Convert the date format from DD-MM-YYYY to YYYY-MM-DD in big query by using 'text files on cloud storage to big query ' Dataflow template GCP
I am new to GCP requesting some help to solve my issue.
I am creating CSV file, json file and java script file and uploading into GCP bucket.
Creating the 'Text files on cloud storage to big query' Dataflow template to populate the data into…
0
votes
1 answer
Backfill Beam pipeline with historical data
I have a Google Cloud Dataflow pipeline (written with the Apache Beam SDK) that, in its normal mode of operation, handles event data published to Cloud Pub/Sub.
In order to bring the pipeline state up to date, and to create the correct outputs,…

Raman
- 17,606
- 5
- 95
- 112
0
votes
1 answer
How to process dataflow two batch files simultaneously on GCP
I want to process two files from gcp to dataflow at the same time simultaneously.
I think it will be possible if one more file comes in side-input.
However, in this case, I think it will be processed every time, not just once.
e.g) How to read and…

Quack
- 680
- 1
- 8
- 22
0
votes
1 answer
Migrating from Google App Engine Mapreduce to Apache Beam
I have been a long-time user of Google App Engine's Mapreduce library for processing data in the Google Datastore. Google no longer supports it and it doesn't work at all in Python 3. I'm trying to migrate our older Mapreduce jobs to Google's…

speedplane
- 15,673
- 16
- 86
- 138
0
votes
1 answer
beam.Create() with list of dicts is extremely slow compared to a list of strings
I am using Dataflow to process a Shapefile with about 4 million features (about 2GB total) and load the geometries into BigQuery, so before my pipeline starts, I extract the shapefile features into a list, and initialize the pipeline using…

Travis Webb
- 14,688
- 7
- 55
- 109
0
votes
1 answer
Error 401 with cloud scheduler while passing Dataflow template as URL via POST request
I have created a custom template for Dataflow Batch Jobs. Now I need to run every 5 minutes using cloud scheduler.
The template is stored in cloud storage. But I'm getting 401 error, whenever I pass the URI of template in my POST request from…

anagha s
- 323
- 1
- 4
- 15
0
votes
2 answers
Running a dataflow batch using flexRSGoal
I found this article about running a dataflow batch on preemptive machines.
I tried to use this feature using this script:
gcloud beta dataflow jobs run $JOB_NAME \
--gcs-location gs://.../Datastore_to_Datastore_Delete \
…

No1Lives4Ever
- 6,430
- 19
- 77
- 140
0
votes
1 answer
Apache Beam - Bigquery Upsert
I have a dataflow job which splits up a single file into x number of records (tables). These flow in to bigquery no problem.
What I found though was there was no way to then execute another stage in the pipeline following the results.
For example
#…

YetiBoy
- 51
- 1
- 4
0
votes
1 answer
Error when creating Google Dataflow template file
I'm trying to schedule a Dataflow that ends after a set amount of time using a template. I'm able to successfully do this when using the command line, but when I try and do it with Google Cloud Scheduler I run into an error when I create my…

Mark Martinez
- 3
- 3
0
votes
0 answers
Time limit possible for Google's Dataflow?
I've managed to use Google Cloud Scheduler to schedule a dataflow pipeline running, but I also want the pipeline to run for max an hour. Is it possible to schedule an end time for dataflow?
edit: I've created a pipeline that would wait a certain…

Mark Martinez
- 3
- 3
0
votes
0 answers
Error in SQL Launcher (java.lang.NullPointerException) in Google Dataflow SQL
I am trying to read the data from a Pubsub topic using Google dataflow SQL and getting "NullPointerException" error. Could anyone guide me on what I am doing wrong.
Below is the SQL query. I tried selecting few columns also. Same error is…

Sriraam Venkataraman
- 73
- 7
0
votes
1 answer
Using schema update option in beam.io.writetobigquery
I am loading a bunch log files into BigQuery using apache beam data flow. The file format can change over a period of time by adding new columns to the files. I see Schema Update Option ALLOW_FILED_ADDITION.
Anyone know how to use it? This is how my…

rens
- 43
- 6