Questions tagged [dataflow]

Dataflow programming is a programming paradigm in which computations are modeled through directed graphs: nodes are instructions and data flows through the connections between them.

Dataflow programming is a programming paradigm which models programs as directed graphs and calculation proceeds in a way similar to electrical circuits. More precisely:

  • nodes are instructions that takes one or more inputs, perform calculation on them and present the result(s) as output;
  • edges connects inputs and outputs of the instructions -- this way the output of an instruction can be fed directly to the input on another node to trigger another calculation;
  • data "travels" using the directed edges and triggers the instructions as they pass through the nodes.

Often dataflow programming languages are visual, the most prominent example being LabView.

Resources

1152 questions
10
votes
3 answers

More efficiently compute transitive closures of each dependents while incrementally building the directed graph

I need to answer the question: given a node in a dependency graph, group its dependents by their own transitive dependents which would be impacted by a particular start node. In other words, given a node in a dependency graph, find the set of sets…
10
votes
2 answers

ClassNotFound exception when attempting to use DataflowRunner

I'm trying to launch a Dataflow job on GCP using Apache Beam 0.6.0. I am compiling an uber jar using the shade plugin because I cannot launch the job using "mvn:execjava". I'm including this dependency:
9
votes
1 answer

At what stage does Dataflow/Apache Beam ack a pub/sub message?

I have a dataflow streaming job with Pub/Sub subscription as an unbounded source. I want to know at what stage does dataflow acks the incoming pub/sub message. It appears to me that the message is lost if an exception is thrown during any stage of…
Kakaji
  • 1,421
  • 2
  • 15
  • 23
9
votes
1 answer

Using Clojure DataFlow programming idioms

Can someone explain why and how I would use the Clojure Dataflow programming API as I can't seem to find much about it on the internet.
yazz.com
  • 57,320
  • 66
  • 234
  • 385
9
votes
3 answers

TPL Dataflow block consumes all available memory

I have a TransformManyBlock with the following design: Input: Path to a file Output: IEnumerable of the file's contents, one line at a time I am running this block on a huge file (61GB), which is too large to fit into RAM. In order to avoid…
Brian Berns
  • 15,499
  • 2
  • 30
  • 40
9
votes
4 answers

Dataflow Programming API for Java?

I am looking for a Dataflow / Concurrent Programming API for Java. I know there's DataRush, but it's not free. What I'm interested in specifically is multicore data processing, and not distributed, which rules out MapReduce or Hadoop. Any…
Rollo Tomazzi
  • 3,120
  • 3
  • 28
  • 21
8
votes
0 answers

Python TPL dataflow analog

Is there an analogue of the .NET TPL Dataflow in the python world? Dataflow is a library where you can connect “blocks” to each in order to create a pipeline (or graph). There are different types of blocks that provide different functionality and…
Evgen
  • 347
  • 2
  • 17
8
votes
0 answers

grpc StatusRuntimeException on Dataflow

I have a dataflow pipeline in which I consume pubsub messages, treat them, and then publish to pubsub. Whenever I have too many calculations (ie I increase the amount of treatment for each message) I get an Exception. :…
8
votes
3 answers

How to Monitor/inspect data/attribute flow in Java code

I have a use case when I need to capture the data flow from one API to another. For example my code reads data from database using hibernate and during the data processing I convert one POJO to another and perform some more processing and then…
M.J.
  • 16,266
  • 28
  • 75
  • 97
8
votes
2 answers

Throttling a step in beam application

I'm using python beam on google dataflow, my pipeline looks like this: Read image urls from file >> Download images >> Process images The problem is that I can't let Download images step scale as much as it needs because my application can get…
Xitrum
  • 7,765
  • 26
  • 90
  • 126
8
votes
2 answers

What's the crucial difference between Angular 2 Data Flow and Flux?

Hi I am studying Angular 2 and React + Redux right now, and I have a question on the difference of the difference in data flow of those two choices. Angular 2 uses uni-directional data flow by default. Redux is a Flux implementation, which (also)…
sangyongjung
  • 161
  • 1
  • 5
8
votes
1 answer

Dataflow processing

I have a class of computations that seems to naturally take a graph structure. The graph is far from linear, as there are multiple inputs as well as nodes that fan out and nodes that require the result of several other nodes. In all of these…
em70
  • 6,088
  • 6
  • 48
  • 80
7
votes
3 answers

How to get apache beam for dataflow GCP on Python 3.x

I'm very newby with GCP and dataflow. However , I would like to start to test and deploy few flows harnessing dataflow on GCP. According to the documentation and everything around dataflow is imperative use the Apache project BEAM. Therefore and…
7
votes
1 answer

How to create groups of N elements from a PCollection Apache Beam Python

I am trying to accomplish something like this: Batch PCollection in Beam/Dataflow The answer in the above link is in Java, whereas the language I'm working with is Python. Thus, I require some help getting a similar construction. Specifically I have…
7
votes
2 answers

How to skip last row in the SSIS data flow

I am using FlatFile Source Manager --> Script COmponent as Trans --> OLEDB destination in my data flow. Source reads all the rows from flat file and i want to skip the last row (Trailer record) updating the database. Since it contains the NULL…
VHK
  • 193
  • 2
  • 4
  • 12
1
2
3
76 77