Questions tagged [dataflow]

Dataflow programming is a programming paradigm in which computations are modeled through directed graphs: nodes are instructions and data flows through the connections between them.

Dataflow programming is a programming paradigm which models programs as directed graphs and calculation proceeds in a way similar to electrical circuits. More precisely:

  • nodes are instructions that takes one or more inputs, perform calculation on them and present the result(s) as output;
  • edges connects inputs and outputs of the instructions -- this way the output of an instruction can be fed directly to the input on another node to trigger another calculation;
  • data "travels" using the directed edges and triggers the instructions as they pass through the nodes.

Often dataflow programming languages are visual, the most prominent example being LabView.

Resources

1152 questions
4
votes
1 answer

How do I convert table row PCollections to key,value PCollections in Python?

There is NO documentation regarding how to convert pCollections into the pCollections necessary for input into .CoGroupByKey() Context Essentially I have two large pCollections and I need to be able to find differences between the two, for type II…
4
votes
1 answer

TPL dataflow that receives a collection and calls its linked block for each element

sorry if there is already a similar question, I can't find it. I have the following situation: I have to do some processing on images and the TPL Dataflow fits in nicely here, because it allows me to easily do different parts of my workflow in…
paperplane
  • 45
  • 1
  • 5
4
votes
1 answer

How to reduce the initialisation and termination time in google dataflow job?

I'm currently working on a POC and primarily focusing on Dataflow for ETL processing. I have created the pipeline using Dataflow 2.1 Java Beam API, and it takes about 3-4 minutes just to initialise, and also it takes about 1-2 minutes for…
Vijin Paulraj
  • 4,469
  • 5
  • 39
  • 54
4
votes
1 answer

Troubleshooting apache beam pipeline import errors [BoundedSource objects is larger than the allowable limit]

I have a bunch of text files (~1M) stored on google cloud storage. When I read these files into Google Cloud DataFlow pipeline for processing, I always get the following error: Total size of the BoundedSource objects returned by…
4
votes
1 answer

How do I perform a "diff" on two Sources given a key using Apache Beam Python SDK?

I posed the question generically, because maybe it is a generic answer. But a specific example is comparing 2 BigQuery tables with the same schema, but potentially different data. I want a diff, i.e. what was added, deleted, modified, with respect…
successhawk
  • 3,071
  • 3
  • 28
  • 44
4
votes
0 answers

Link input to an output in React

I'm new to React and I'd try to tinker a little bit with dataflow when using and some kind of output. The idea is to type something in an input bar and have it shown below the bar as the user is typing. I wish to do this without stuff like Flux or…
Lehren
  • 99
  • 2
  • 11
4
votes
0 answers

How to add record numbers to TextIO file sources in Apache Beam or Dataflow

I am using Dataflow (and now Beam) to process legacy text files to replicate the transformations of an existing ETL tool. The current process adds a record number (the record number for each row within each file) and the filename. The reason they…
4
votes
1 answer

Dataflow TransformManyBlock throttling

How can I throttle a TransformManyBlock in a Dataflow mesh? I specified a BoundedCapacity but it looks like it only afects the input queue. So my block keeps processing input and output queue keeps growing. The following blocks also have a…
dou bret
  • 267
  • 2
  • 11
4
votes
1 answer

ReactFX compared to Sodium

This book about Sodium is a good and clear intro to FRP. I expect that - because the book on Sodium is easy to understand - by comparing the two libraries (Sodium and ReactFX) people can leverage what they learn from the book and use that knowledge…
jhegedus
  • 20,244
  • 16
  • 99
  • 167
4
votes
4 answers

What is a good motivating example for dataflow concurrency?

I understand the basics of dataflow programming and have encountered it a bit in Clojure APIs, talks from Jonas Boner, GPars in Groovy, etc. I know it's prevalent in languages like Io (although I have not studied Io). What I am missing is a…
Alex Miller
  • 69,183
  • 25
  • 122
  • 167
4
votes
1 answer

How can I obtain data flow graph along with c-use and p-use variables of a C code?

Is there any online tool/software(open-source preferred) that makes data flow graph of a C code and also gives p-use and c-use variables in it.
4
votes
3 answers

How to recognize variables that don't affect the output of a program?

Sometimes the value of a variable accessed within the control-flow of a program cannot possibly have any effect on a its output. For example: global var_1 global var_2 start program hello(var_3, var_4) if (var_2 < 0) then …
Raul Bertone
  • 146
  • 7
4
votes
1 answer

TPL Dataflow Blocks Running On UI Thread

I am building a dataflow pipeline to do various processing (mostly I/O, but some CPU processing) that is in a naturally occurring flow. The flow is currently in this basic pattern: Load Data from File Parse Record using Transform Block Serialize &…
JNYRanger
  • 6,829
  • 12
  • 53
  • 81
4
votes
2 answers

Dataflow programming vs Actor model

How can the difference between 'Dataflow Programming' and 'Actor model' be described? As far as I understand, they are not unrelated but yet are not the same. Is DF a wider concept, which gist is the distinction from Control Flow model, while the…
pavel.baravik
  • 689
  • 1
  • 11
  • 21
4
votes
0 answers

Findbugs dataflow analysis

I'm using Findbugs to find the definition of a variable. The use case is that if a detector finds a bug, I need know where the definition of the variable related to the bug. Is there any API or I need implement it.
Paul
  • 954
  • 19
  • 34