Why does a single structured query run multiple SQL queries per batch?

Asked Sep 11 '17 at 18:25

Active Oct 28 '18 at 21:12

Viewed 700 times

Why does the following structured query run multiple SQL queries as can be seen in web UI's SQL tab?

import org.apache.spark.sql.streaming.{OutputMode, Trigger}
import scala.concurrent.duration._
val rates = spark.
  readStream.
  format("rate").
  option("numPartitions", 1).
  load.
  writeStream.
  format("console").
  option("truncate", false).
  option("numRows", 10).
  trigger(Trigger.ProcessingTime(10.seconds)).
  queryName("rate-console").
  start

edited Oct 28 '18 at 21:12

Community

asked Sep 11 '17 at 18:25

Jacek Laskowski

72,696
27
242
420

What does your DAG visualization look like? – Yuval Itzchakov Sep 12 '17 at 04:32
It does **not** matter since DAGs are **after** the queries. A SQL query can be zero or more Spark jobs and DAGs are afterwards, aren't they? Or you think about another DAG. – Jacek Laskowski Sep 12 '17 at 06:33
3

No need to use bold :) I just didn't know that a single query can be more than one job. How does that work exactly? I thought the graph defines the query, which then is executed by the scheduler. – Yuval Itzchakov Sep 12 '17 at 06:43
1

@YuvalItzchakov Sorry Yuval. Didn't mean to **bold** you :) I'm working on the answer as we speak, and suffice to say that the answer depends on the source(s) and the sink (so it's not obvious without reviewing the internals of each). In this case ConsoleSink triggers two queries by design. – Jacek Laskowski Sep 12 '17 at 07:01
2

Aha, OK. Waiting for your answer. – Yuval Itzchakov Sep 12 '17 at 07:05

Why does a single structured query run multiple SQL queries per batch?

0 Answers0

Linked