Questions tagged [dataflow]

Dataflow programming is a programming paradigm in which computations are modeled through directed graphs: nodes are instructions and data flows through the connections between them.

Dataflow programming is a programming paradigm which models programs as directed graphs and calculation proceeds in a way similar to electrical circuits. More precisely:

nodes are instructions that takes one or more inputs, perform calculation on them and present the result(s) as output;
edges connects inputs and outputs of the instructions -- this way the output of an instruction can be fed directly to the input on another node to trigger another calculation;
data "travels" using the directed edges and triggers the instructions as they pass through the nodes.

Often dataflow programming languages are visual, the most prominent example being LabView.

Resources

a good overview and a list of languages is given in this question: Dataflow Programming Languages
this paper gives a thorough overview of dataflow programming: Advances in Dataflow Programming Languages
another overview of dataflow programming: Dataflow Programming Concept, Languages and Applications
Dataflow Programming on Wikipedia
Dataflow Programming on C2

1152 questions

votes

1 answer

Issues with throttling a TPL Dataflow with SemaphoreSlim

Scope: I want to process a large file (1 GB+) by splitting it into smaller (manageable) chunks (partitions), persist them on some storage infrastructure (local disk, blob, network, etc.) and process them one by one, in memory. I want to achieve…

asked Jun 14 '21 at 15:28

Bogdan Rotaru

votes

1 answer

Examples of monadic effects inside a rewrite function in Hoopl?

The type of (forward) rewriting functions in Hoopl is given by the mkFRewrite function: mkFRewrite :: (FuelMonad m) => (forall e x. n e x -> f -> m (Maybe (hoopl-3.8.6.1:Compiler.Hoopl.Dataflow.Graph n e x))) -> FwdRewrite m…

haskell compiler-construction monads dataflow hoopl

asked Jul 01 '11 at 15:30

Justin Bailey

1,487
11
15

votes

2 answers

Dataflow Flex template job is Queued

I am trying to reproduce this tutorial to run a Flex Template on Dataflow. When I submit the job, I can see it in the console but it's not started and marked as Queued. Does this mean that the job is submitted in a FlexRS mode ? How can I start…

google-cloud-platform google-cloud-dataflow dataflow

asked Dec 23 '20 at 21:58

farhawa

10,120
16
49
91

votes

3 answers

What are schemas for in Apache Beam?

I was reading the docs about SCHEMAS in Apache BEAM but i can not understand what its purpose is, how and why or in which cases should i need to use them. What is the difference between using schemas or using a class that extends the Serializable…

apache-beam dataflow

asked Jun 16 '20 at 16:13

Sergio Fonseca

votes

2 answers

Dataflow fails when I add requirements.txt [Python]

So when I try to run dataflow with the DataflowRunner and include the requirements.txt which looks like this google-cloud-storage==1.28.1 pandas==1.0.3 smart-open==2.0.0 Every time it fails on this line…

python google-cloud-dataflow dataflow requirements

asked May 26 '20 at 22:23

Alex Fragotsis

1,248
18
36

votes

1 answer

Static dataflow graph generator for Python?

I've been struggling for quite some time to find a static dataflow graph generator for Python. This is my ideal: Given a small python script example.py, (written in Python3), return some representation of the data flow graph. I was able to achieve…

python programming-languages static-analysis dataflow graphml

asked May 15 '20 at 19:56

Claude Shannon

votes

2 answers

Cloud SQL to BigQuery incrementally

I need some suggestions for one of the use cases I am working on. Use Case: We have data in Cloud SQL around 5-10 tables, some are treated as lookup and others transactional. We need to get this to BigQuery in a way to make 3-4 tables(Flattened,…

google-cloud-platform google-bigquery google-cloud-dataflow google-cloud-sql dataflow

asked Jan 28 '20 at 03:38

Pankaj Bajpai

votes

1 answer

Side inputs vs normal constructor parameters in Apache Beam

I have a general question on side inputs and broadcasting in the context of Apache Beam. Does any additional variables, lists, maps that are need for computation during processElement, need to be passed as side input? Is it ok if they are passed as…

google-cloud-dataflow apache-beam broadcast dataflow data-processing

asked Jul 15 '19 at 21:21

Asif Iqbal

4,562
5
27
31

votes

1 answer

Beam / Dataflow Custom Python job - Cloud Storage to PubSub

I need to perform a very simple transformation on some data (extract a string from JSON), then write it to PubSub - I'm attempting to use a custom python Dataflow job to do so. I've written a job which successfully writes back to Cloud Storage, but…

python google-cloud-storage apache-beam google-cloud-pubsub dataflow

asked Jul 03 '19 at 11:14

originalgriefster

votes

1 answer

BigQueryIO Read vs fromQuery

Say in Dataflow/Apache Beam program, I am trying to read table which has data that is exponentially growing. I want to improve the performance of the read. BigQueryIO.Read.from("projectid:dataset.tablename") or BigQueryIO.Read.fromQuery("SELECT A,…

google-bigquery google-cloud-dataflow dataflow

asked Jan 29 '19 at 04:04

Roshan Fernando

votes

2 answers

Apache Beam: ReadFromText versus ReadAllFromText

I'm running an Apache Beam pipeline reading text files from Google Cloud Storage, performing some parsing on those files and the writing the parsed data to Bigquery. Ignoring the parsing and google_cloud_options here for the sake of keeping it…

google-cloud-platform apache-beam dataflow

asked Jul 30 '18 at 15:37

Richardt Benade REZCO

votes

3 answers

SIGNAL vs Esterel vs Lustre

I'm very interested in dataflow and concurrency focused languages. I've read up on the subject and repeatedly I see SIGNAL, Esterel, and Lustre mentioned; so I take it they're prominent players in those fields. However, many of their links in the…

asynchronous concurrency synchronous dataflow

asked May 22 '18 at 05:08

buddingprogrammer

votes

2 answers

Easiest way to convert a TableRow to JSON-formatted String, in dataflow 2.x?

Short of writing my own function to do it, what is the easiest way to convert a TableRow object, inside a dataflow 2.x pipeline, to a JSON-formatted String? I thought the code below would work, but it isn't correctly inserting quotes in between…

java json apache-beam dataflow

asked Jan 02 '18 at 22:20

Max

votes

2 answers

Apache Beam, NoSuchMethodError on BigQueryIO.WriteTableRows()?

I've recently upgraded an existing pipeline from dataflow 1.x to dataflow 2.x, and I'm seeing an error that doesn't make sense to me. I'll put the relevant code below, then include the error I'm seeing. // This is essentially the final step in our…

java apache-beam dataflow

asked Jan 02 '18 at 20:14

Max

votes

1 answer

Apache Beam Error - AsList object is not iterable

I'm trying to make a side input from a pcollection in apache beam with python. This is my code: from apache_beam.pvalue import AsList locations_dim = p | beam.io.Read(beam.io.BigQuerySource( query='SELECT a, b, c, d FROM test.testing_table')) |…

python apache-beam dataflow

asked Dec 05 '17 at 10:45

SaadK

Prev 1 2 3

…

76 77 Next