Questions tagged [beam]

This tag should be used for questions about the BEAM, the Erlang virtual machine.

The BEAM (Bogdan/Björn's Erlang Abstract Machine) is the Erlang virtual machine. Besides , there are also other languages that can target the BEAM virtual machine, such as Joxa, , , and others.

Disambiguation

  • Use for questions related to Apache Beam, an SDK for batch and stream processing.
  • Use for questions related to Android Beam, the NFC peer-to-peer mode NDEF message exchange mechanism in Android.
  • Use for questions related to the heuristic search algorithm beam search.
106 questions
1
vote
3 answers

Reloading/Recompiling/Refreshing .beam files inside a terminal

I use Eclipse and Erlide to develop in Erlang. To run the software I enter the ebin/ directory with my terminal since I don't like the console Eclipse provides. However after each change I have to exit and re-enter erl in the terminal to reload the…
danihodovic
  • 1,151
  • 3
  • 18
  • 28
1
vote
2 answers

How can I get beam size for Erlang?

I have a legacy Erlang program that needs optimizations. This piece of code uses up to 20G memory in run time. I'm wondering if there is a way to get the Erlang Beam size of the process itself in run time? If that is possible then I can do something…
Jian Wang
  • 11
  • 1
1
vote
2 answers

tsung ts_config_server Can't start newbeam on host (reason: timeout) Aborting

I am currently in the midst of doing distributed load testing on Amazon's EC2 services and have diligently followed all documentation/forum/support on how to get things to work, but unfortunately find myself stuck at this point. No one in any of the…
ikosuave
  • 46
  • 6
0
votes
0 answers

Spark Task Data loss after worker dies in Java

I have a Java program in which I use Spark as a runner for beam pipeline. There is a Spark task that collects some data. It got finished correctly but, after that, its worker died and this task got assigned to another worker. Why doesn't it recover…
0
votes
0 answers

Dataflow - process single input to multiple outputs using a ptransform

I read data from Pubsub and there are different types of data. I would provide runtime argument based on which it will have to create multiple outputs( using branching) but the idea is i would get multiple Pcollection after Ptransform but should i…
0
votes
0 answers

Apache Beam dataflow combine per key

I have a problem with my pipeline. My goal is a read around 4k parquet files read it as a numpy array and then make some aggregations eg from one file can make 100 keys each key has some numbers of data. Then I have combine per key logic and my goal…
Dawid
  • 11
  • 1
0
votes
1 answer

Java - Apache Beam - Control number of connections when writing in MongoDB

I'm currently working on a streaming pipeline in Apache Beam (v2.43) to insert data in mongoDB. It runs on dataflow quite fine, but I'm not able to control the number of connections : in case of input peak (PubSub), dataflow scales up and…
0
votes
0 answers

Does GroupIntoBatches guarantee that code is run only once per batch?

Reading this article https://cloud.google.com/blog/products/data-analytics/after-lambda-exactly-once-processing-in-google-cloud-dataflow-part-1 The side effects section says Cloud Dataflow does not guarantee that this code is run only once per…
knoeh
  • 1
  • 1
0
votes
1 answer

Apache Beam : java.lang.IllegalStateException when reading from MSSQL table

I'm having a beam pipeline that reads from MSSQL table using a simple query : return "SELECT " + "U.ID as userid, " + "U.firstname as firstname, " + "U.lastname as lastname, " + "email as email, " + "U.IP…
ah_ben
  • 85
  • 7
0
votes
1 answer

How to use PCollection as a sideinput in Beam?

I am working on a Beam (Dataflow) pipeline, where the task is to read the messages from pubsub and then perform some transformations. In case there are some failures in any of these transformations I want to send message to the dead letter…
0
votes
1 answer

How to cancel a GCP Dataflow job programmingly using Beam and just the job ID

We have a GCP Dataflow project which requires us to cancel a running Dataflow job. All we have at the time of cancellation is the Job ID. From other posts on Stackoverflow, I learned we can cancel a job using something like this: PipelineResult…
ZZZ
  • 645
  • 4
  • 17
0
votes
0 answers

Apache Beam write kafka Records to Avro File

I would like to read couple of rows from Kafka topic and create a avro file. I have the partial code working which is reading from kafka topic and printing to console works. what I would like to know how to use the avroIO to write the generic record…
developer2015
  • 399
  • 8
  • 25
0
votes
1 answer

How does apache Beam give exactly once guarantee and do stateful calculation without checkpoint or fault tolerence?

Things like groupby or combine needs exactly once guarantee for trivial calculation like sum But apache beam seems to not have checkpoint baked in to the library, does it rely on flink or spark to manage fault tolerence and consistency in state?
olaf
  • 239
  • 1
  • 8
0
votes
1 answer

How to limit througput on an Apache Beam pileline in Go?

I wrote a basic pipeline in Go running on Google Dataflow. Basically it transforms Pubsub events to elastic documents and then update Elastic document in bulk. I need to find a way to limit the number of Bulk request per second. Because when my…
0
votes
0 answers

Apache Beam IOElasticsearchIO.read() method (Java), which expects a PBegin input and a means to handle a collection of queries

I'm running into an issue using the ElasticsearchIO.read() to handle more than one instance of a query. My queries are being dynamically built as a PCollection based on an incoming group of values. I'm trying to see how to load the .withQuery()…