Questions tagged [apache-beam-io]

Apache Beam is an unified SDK for batch and stream processing. This tag should be used for questions related to reading data into an Apache Beam pipeline, or writing the output of a pipeline to a destination.

Apache Beam is an open source, unified model for defining and executing both batch and streaming data-parallel processing pipelines, as well as a set of language-specific SDKs for constructing pipelines and runtime-specific Runners for executing them.

Apache Beam I/O refers to the process of loading data into a Beam pipeline, or writing the output of a pipeline to a destination.

References

Related Tags

539 questions

votes

1 answer

Kafka: exactly once semantics configuration using Apache Beam

I'm trying to configure exactly once semantics in Kafka (Apache Beam). Here are the changes what I'm going to introduce: Producer: enable.idenpotence = true transactional.id = uniqueTransactionalId Consumer: set enable.auto.commit = false //…

apache-kafka google-cloud-dataflow apache-beam apache-beam-io

asked Feb 28 '19 at 16:31

Anton Litvinenko

votes

1 answer

Use Dataflow failed insert WriteResult to handle table not found exception

Hi I want to dynamic create table on the fly in Dataflow pipelnie First, I capture BigQueryIO WriteResult, then use it to create table WriteResult writeResult = incomingRecords.apply( "WriteToBigQuery", …

google-cloud-dataflow apache-beam dataflow apache-beam-io

asked Nov 14 '18 at 04:27

c1mone

votes

2 answers

How to speedup bulk importing into google cloud datastore with multiple workers?

I have an apache-beam based dataflow job to read using vcf source from a single text file (stored in google cloud storage), transform text lines into datastore Entities and write them into the datastore sink. The workflow works fine but the cons I…

google-cloud-datastore google-cloud-dataflow apache-beam apache-beam-io vcf-variant-call-format

asked May 07 '18 at 01:15

greeness

15,956
5
50
80

votes

1 answer

'module' object has no attribute 'WriteToBigQuery' when running Apache Beam on Google App Engine Flex

I have a Google App Engine triggering a Cloud DataFlow pipeline. This pipeline is supposed to write the final PCollection to Google BigQuery, but I can't find a way to install the right apache_beam.io dependency. I'm running Apache Beam version…

python google-app-engine google-bigquery apache-beam apache-beam-io

asked Feb 09 '18 at 15:52

Hannon Queiroz

votes

3 answers

Reading bulk data from a database using Apache Beam

I would like to know, how JdbcIO would execute a query in parallel if my query returns millions of rows. I have referred https://issues.apache.org/jira/browse/BEAM-2803 and the related pull requests. I couldn't understand it completely. ReadAll…

google-cloud-dataflow apache-beam apache-beam-io

asked Dec 27 '17 at 09:55

Balu

votes

1 answer

Apache Beam : Transform an objects having a list of objects to multiple TableRows to write to BigQuery

I am working on a beam pipeline to process a json and write it to bigquery. The JSON is like this. { "message": [{ "name": "abc", "itemId": "2123", "itemName": "test" }, { "name": "vfg", "itemId": "56457", "itemName":…

google-bigquery google-cloud-dataflow apache-beam apache-beam-io

asked Oct 26 '17 at 06:34

Balu

votes

2 answers

Using CoGroupByKey with custom type ends up in a Coder error

I want to join two PCollection (from a different input respectively) and implement by following the step described here, "Joins with CoGroupByKey" section: https://cloud.google.com/dataflow/model/group-by-key In my case, I want to join GeoIP's…

google-cloud-dataflow apache-beam apache-beam-io

asked Sep 26 '17 at 07:40

Norio Akagi

votes

1 answer

Apache Beam - org.apache.beam.sdk.util.UserCodeException: java.sql.SQLException: Cannot create PoolableConnectionFactory (Method not supported)

I am trying to connect to a hive instance installed in cloud instance using Apache beam-dataflow. When I run this, I am getting the below exception. This is happening when I access this db using Apache beam. I have seen many related questions which…

google-cloud-dataflow apache-beam apache-beam-io

asked Jul 04 '17 at 08:53

Balu

votes

2 answers

Programmatically generating BigQuery schema in Beam pipeline

I have a collection of homogeneous dicts, how do I write them to BigQuery without knowing the schema? The BigQuerySink requires that I specify the schema when I construct it. But, I don't know the schema: it's defined by the keys of the dicts I'm…

python google-bigquery apache-beam apache-beam-io

asked Jun 30 '17 at 15:55

Greg

votes

1 answer

Simple Apache Beam manipulations work very slow

I'm very new to Apache Beam and my Java skills are quite low, but I'd like to understand why my simple entries manipulations work so slow with Apache Beam. What I'm trying to perform is the following: I have a CSV file with 1 million of records…

java maven apache-beam apache-beam-io

asked Jun 24 '17 at 12:33

Petr Razumov

1,952
2
17
32

votes

1 answer

apache beam bigtable Iterable mutation

I'm migrating my google dataflow java 1.9 to beam 2.0 and I'm trying to use the BigtableIO.Write .... .apply("", BigtableIO.write() .withBigtableOptions(bigtableOptions) .withTableId("twoSecondVitals")); In the…

google-cloud-dataflow apache-beam apache-beam-io

asked Jun 24 '17 at 00:21

Mike

votes

1 answer

Sharing schema definition between BigQuery Client Libraries and Beam IO

Background: We are using cloud data flow runner in Beam 2.0 to ETL our data to our warehouse in BigQuery. We would like to use the BigQuery Client Libraries (Beta) to create the schema of our data warehouse prior to the beam pipelines populating…

google-bigquery google-cloud-dataflow apache-beam apache-beam-io

asked Jun 09 '17 at 07:10

Sobhan Badiozaman

votes

2 answers

Why increments are not supported in Dataflow-BigTable connector?

We have a use case in the Streaming mode where we want to keep track of a counter on BigTable from the pipeline (something #items finished processing) for which we need the increment operation. From looking at…

google-cloud-dataflow google-cloud-bigtable apache-beam apache-beam-io

asked May 08 '17 at 18:34

noobNeverything

votes

0 answers

How to created an unbounded input for Beam in Go?

I'm trying to use the Go Beam Sdk to create a pipeline processing pubsub messages. github.com/apache/beam/sdks/v2/go/pkg/beam I understand that the pubsubio connector is doing external calls working only on dataflow runner. What if I want to test my…

go apache-beam apache-beam-io beam

asked Jan 21 '23 at 22:50

boolangery

votes

1 answer

Same Apache beam code works in Direct Runner but not in Dataflow runner

I have a piece of apache beam pipe code that reads from a file in the GCS bucket and prints it. It is working perfectly with the DirectRunner and prints the file output but with the Dataflow runner it is not printing anything no errors as well. Do…

python google-cloud-dataflow apache-beam apache-beam-io

asked Oct 16 '22 at 12:55

HKS

Prev 1 2 3

…

35 36 Next