Questions tagged [dataflow]

Dataflow programming is a programming paradigm in which computations are modeled through directed graphs: nodes are instructions and data flows through the connections between them.

Dataflow programming is a programming paradigm which models programs as directed graphs and calculation proceeds in a way similar to electrical circuits. More precisely:

  • nodes are instructions that takes one or more inputs, perform calculation on them and present the result(s) as output;
  • edges connects inputs and outputs of the instructions -- this way the output of an instruction can be fed directly to the input on another node to trigger another calculation;
  • data "travels" using the directed edges and triggers the instructions as they pass through the nodes.

Often dataflow programming languages are visual, the most prominent example being LabView.

Resources

1152 questions
6
votes
1 answer

How to construct a TransformManyBlock with a delegate

I'm new to C# TPL and DataFlow and I'm struggling to work out how to implement the TPL DataFlow TransformManyBlock. I'm translating some other code into DataFlow. My (simplified) original code was something like this: private IEnumerable
Matt L
  • 83
  • 5
6
votes
1 answer

How do I generate a data flow graph with clang or other tools?

With clang and graphviz I can generate the calling graph for some C/C++ code as explained in this answer. Now I need a data flow diagram computed on a really large codebase ( it's C for the most part ), this codebase is a software where cmake is…
algl
  • 61
  • 1
  • 5
6
votes
4 answers

Get the level of a hierarchy

I have an array of objects, Where each object has an id and a ParentId property (so they can be arranged in trees). They are in no particular order. Please note that the id's and parentId's will not be integers, they will be strings (just wanted to…
adardesign
  • 33,973
  • 15
  • 62
  • 84
6
votes
2 answers

DD anomaly, and cleaning up database resources: is there a clean solution?

Here's a piece of code we've all written: public CustomerTO getCustomerByCustDel(final String cust, final int del) throws SQLException { final PreparedStatement query = getFetchByCustDel(); ResultSet records = null; …
Simon Brooke
  • 386
  • 4
  • 7
5
votes
2 answers

Dataflow between Android BroadcastReceiver, ContentProvider, and Activity?

I've developed an application that receives a Broadcast and then launches an Activity, where that Activity queries a ContentProvider which pulls information out of the DNS in real-time. I'd like to be able to shuffle this so that instead of…
5
votes
6 answers

UML for multithreading dataflow

I want to paint a diagram where you can see the dataflow of a java program, and if there are one or multiple threads handling the data. Sequence charts don't show multithreading and get very confusion when you have more than 5 different…
Franz Kafka
  • 10,623
  • 20
  • 93
  • 149
5
votes
2 answers

Dataflow - Error: Message: Required 'compute.subnetworks.get' permission

Scenario - Running Dataflow jobs on project A using a shared VPC to use the region and subnetwork of host project B On the service account, I have following permission on both project A and B Compute Admin Compute Network User Dataflow Admin Cloud…
5
votes
1 answer

Source and Sink data from/to Azure Data Lake Store gen1 with Azure data factory's (ADF) Data Flow (DF)

I have a Azure Data Lake Store gen1 (ADLS-1) and a Azure Data Factory (ADF) (V2) with Data Flow (DF). When I create a new DF in ADF and select in the Source and/or Sink node a dataset from ADLS-1, I get the following validation` error (in…
Michael H.
  • 535
  • 6
  • 11
5
votes
1 answer

Slowly Changing Lookup Cache from BigQuery - Dataflow Python Streaming SDK

I am trying to follow the design pattern for Slowly Changing Lookup Cache (https://cloud.google.com/blog/products/gcp/guide-to-common-cloud-dataflow-use-case-patterns-part-1) for a streaming pipeline using the Python SDK for Apache Beam on…
HulaHoof
  • 367
  • 2
  • 15
5
votes
1 answer

Apache Beam pipeline running on Dataflow failed to read from KafkaIO: SSL handshake failed

I'm building an Apache Beam pipeline to read from Kafka as an unbounded source. I was able to run it locally using direct runner. However, the pipeline would fail with the attached exception stack trace, when run using Google Cloud Dataflow runner…
Jianxin Gao
  • 2,717
  • 2
  • 19
  • 32
5
votes
3 answers

Unable to get application default credentials. run on locally

I'm trying this example for retrieve data from GCP Pub/Sub at DataFlow. import java.io.FileInputStream; import java.io.FileNotFoundException; import java.io.IOException; import java.time.Instant; import java.util.ArrayList; import…
Matthew
  • 125
  • 2
  • 11
5
votes
2 answers

What does reshuffling, in the context of exactly-once processing in BigQuery sink, mean?

I'm reading an article on exactly-once processing implemented by some Dataflow sources and sinks and I'm having troubles understanding the example on BigQuery sink. From the article Generating a random UUID is a non-deterministic operation, so we…
MassyB
  • 1,124
  • 4
  • 15
  • 28
5
votes
2 answers

Is TPL Dataflow BufferBlock thread safe?

I have a fairly simple producer-consumer pattern where (simplified) I have two producers who produce output that is to be consumed by one consumer. For this I use System.Threading.Tasks.Dataflow.BufferBlock A BufferBlock object is created. One…
5
votes
3 answers

BigQuery unable to insert job. Workflow failed

I need to run a batch job from GCS to BigQuery via Dataflow and Beam. All my files are avro with the same schema. I've created a dataflow java application that is successful on a smaller set of data (~1gb, about 5 files). But when I try to run it on…
andrew
  • 51
  • 1
  • 4
5
votes
3 answers

java.lang.ClassCastException: com.google.gson.internal.LinkedTreeMap cannot be cast to java.util.LinkedHashMap

I apologize for opening another question about this general issue, but none of the questions I've found on SO seem to relate closely to my issue. I've got an existing, working dataflow pipeline that accepts objects of KV>…
Max
  • 808
  • 11
  • 25