Questions tagged [apache-beam-io]

Apache Beam is an unified SDK for batch and stream processing. This tag should be used for questions related to reading data into an Apache Beam pipeline, or writing the output of a pipeline to a destination.

Apache Beam is an open source, unified model for defining and executing both batch and streaming data-parallel processing pipelines, as well as a set of language-specific SDKs for constructing pipelines and runtime-specific Runners for executing them.

Apache Beam I/O refers to the process of loading data into a Beam pipeline, or writing the output of a pipeline to a destination.

References

Related Tags

539 questions

votes

1 answer

How to create a beam template with current date as an input (updated daily) [Create from GET request]

I am trying to create a Dataflow job run daily with Cloud Scheduler. I need to get the data from an external API using GET requests, so I need the current date as an input. However, when I export the dataflow job as a template for scheduling, the…

templates apache-beam dataflow apache-beam-io value-provider

asked Sep 07 '20 at 15:48

mipu

votes

1 answer

GCP Apache Beam Dataflow JDBC IO Connection Error

Problem When trying to deploy an Apache Beam Pipeline on Google Cloud Platform Dataflow service which connects to a Oracle 11gR2 (11.2.0.4) database to retrieve rows, I received the following error when using the Apache Beam JdbCIO Transform: Error…

java oracle apache-beam ojdbc apache-beam-io

asked Sep 02 '20 at 19:16

WtzqCy

votes

1 answer

Apache Beam KinesisIO Java processing pipeline - application state, error handling & fault-tolerance?

I'm working on my first Apache beam pipeline to process the data streams from AWS Kinesis. I'm familiar with concepts of Kafka on how it handles the consumers' offset/state and have experience in implementing apache storm/spark processing. After…

java apache-beam amazon-kinesis apache-beam-io

asked Aug 21 '20 at 07:31

Neel

votes

0 answers

Apache Beam not properly receiving pub/sub messages from google-cloud-storage

I've been struggling with this problem for a while and can't quite find a fix. I'm building a pipeline that takes data from a public google cloud bucket and does some transformations on it. The thing I'm struggling with right now is getting apache…

python google-cloud-storage apache-beam google-cloud-pubsub apache-beam-io

asked Aug 17 '20 at 00:26

kauii8

votes

1 answer

Apache beam spark /flink runner not getting executed in EMR(Access files from GCS)

I have an apache beam pipeline to index some data to elasticsearch. I was trying to use spark or Flink runner to run the job in AWS EMR. When I tried to run the job on a stand-alone spark on local setup, pipeline works with source files in the local…

apache-spark hadoop apache-beam apache-spark-2.0 apache-beam-io

asked Aug 05 '20 at 03:20

joss

votes

2 answers

Is it possible to execute some code like logging and writing result metrics to GCS at the end of a batch Dataflow job?

I am using apache beam 2.22.0 (java sdk) and want to log metrics and write them to a GCS bucket after a batch pipeline finishes execution. I have tried using result.waitUntilFinish() followed by the intended code: DirectRunner- GCS object is…

google-cloud-dataflow apache-beam apache-beam-io

asked Aug 03 '20 at 09:53

user2179539

votes

0 answers

CassandraBeamIO conversion Into Pcollection of ROWS

I am trying to read Data from Cassandra db using apache beam CassandraIO , my requirement is crating a Pcollection of Rows from cassandra db , currently my code look like this PTransform>transform=CassandraIO.read() …

cassandra apache-beam-io apache-beam

asked Jul 29 '20 at 04:42

bforblack

votes

1 answer

Does ElasticsearchIO for apache-beam java supports Templating and ValueProvider argument? Error While invoking templates

I was trying to create a template for Apache beam to index data to elasticsearch. The template is getting created but while invoking the template the pipeline failed with No protocol Error. It looks very odd as the error is related to the URL…

java elasticsearch google-cloud-dataflow apache-beam apache-beam-io

asked Jul 22 '20 at 22:43

joss

votes

0 answers

How do concept of checkpointing/Fault tolerance work work in apache beam?

I am working on the apache beam streaming pipeline with Kafka producer as input and consumer for the output. Can anyone help me out with checkpoint in apache-beam

apache-beam apache-beam-io apache-beam-kafkaio

asked Jul 09 '20 at 09:06

Akul Sharma

votes

1 answer

How to use Runner_v2 for apache beam dataflow job?

My python code for dataflow job looks like below: import apache_beam as beam from apache_beam.io.external.kafka import ReadFromKafka from apache_beam.options.pipeline_options import…

google-cloud-platform google-cloud-dataflow apache-beam apache-beam-io

asked Jul 08 '20 at 10:41

Joseph N

votes

1 answer

Get worker id in a apache beam job

Is it possible to get the worker-id from a apache beam job? Or any unique identifier that can tell about the current worker ? Cause I want to use it as label for my metric. Thank you.

google-cloud-dataflow apache-beam apache-beam-io

asked Jul 08 '20 at 09:58

Xitrum

7,765
26
90
126

votes

1 answer

Does GCP Dataflow support kafka IO in python?

I am trying to read data from kafka topic using kafka.ReadFromKafka() method in python code.My code looks like below: from apache_beam.io.external import kafka import apache_beam as beam options = PipelineOptions() with…

apache-kafka google-cloud-dataflow apache-beam apache-beam-io

asked Jul 07 '20 at 12:35

Joseph N

votes

2 answers

How to infer avro schema from a kafka topic in Apache Beam KafkaIO

I'm using Apache Beam's kafkaIO to read from a topic that has an avro schema in Confluent schema registry. I'm able to deserialize the message and write to files. But ultimately i want to write to BigQuery. My pipeline isn't able to infer the…

google-bigquery apache-beam confluent-schema-registry apache-beam-io apache-beam-kafkaio

asked Jun 23 '20 at 22:53

artofdoe

votes

1 answer

How to write to BigQuery with BigQuery IO in Apache Beam?

I'm trying to set up an Apache Beam pipeline that reads from Kafka and writes to BigQuery using Apache Beam. I'm using the logic from here to filter out some coordinates:…

java apache-kafka google-bigquery apache-beam apache-beam-io

asked Jun 23 '20 at 02:46

artofdoe

votes

2 answers

Apache Beam : Refreshing a sideinput which i am reading from the MongoDB using MongoDbIO.read() Part 2

Not sure about how this GenerateSequence work for me as i have to read values from Mongo periodically on hourly or on daily basis, created a ParDo that reads the MongoDB, also added window into GlobalWindows with an trigger (trigger i will update as…

mongodb dataflow apache-beam-io apache-beam

asked Jun 10 '20 at 11:13

deepalneema

Prev 1 2 3

…

35 36 Next