1

In Google Dataflow, i have a job that basically looks like this:

Dataset: 100 rows, 1 column.
Recipe: 0 steps
Output: New Table.

But it takes between 6-8 minutes to run. What could be the issue?

1 Answers1

1

Usually times are in minutes, not in seconds for Dataprep/dataflow setup. These solutions are for large data sets and the duration stays constant even if you have 10 times the size.

DataPrep creates for you a DataFlow workflow, and provisions a few VMs for you, that takes time, usually that phase could be in the minute mark. And only a bit later is scaling that up to 50 or 1000 boxes.

Pentium10
  • 204,586
  • 122
  • 423
  • 502
  • Thank you - very nice answer. If there is a lot of "fixed cost" pr. dataflow, then it might make sense to do some larger flows, with more transformations. We are also testing out other solutions with crontab/airflow running some views. – user1449307 Aug 08 '18 at 10:40
  • This means that even if I have 10000x more data, it will still run the same time? Do you have a reference? – WJA Feb 19 '19 at 11:41
  • 1
    @JohnAndrews I don't have, but google the "tram challenge dataflow" it might be interest for you. – Pentium10 Feb 19 '19 at 12:29
  • 10.6 billion rows within some stops for 0.85 USD. – WJA Feb 19 '19 at 12:48