14

Why does the following structured query run multiple SQL queries as can be seen in web UI's SQL tab?

import org.apache.spark.sql.streaming.{OutputMode, Trigger}
import scala.concurrent.duration._
val rates = spark.
  readStream.
  format("rate").
  option("numPartitions", 1).
  load.
  writeStream.
  format("console").
  option("truncate", false).
  option("numRows", 10).
  trigger(Trigger.ProcessingTime(10.seconds)).
  queryName("rate-console").
  start

enter image description here

Community
  • 1
  • 1
Jacek Laskowski
  • 72,696
  • 27
  • 242
  • 420
  • What does your DAG visualization look like? – Yuval Itzchakov Sep 12 '17 at 04:32
  • It does **not** matter since DAGs are **after** the queries. A SQL query can be zero or more Spark jobs and DAGs are afterwards, aren't they? Or you think about another DAG. – Jacek Laskowski Sep 12 '17 at 06:33
  • 3
    No need to use bold :) I just didn't know that a single query can be more than one job. How does that work exactly? I thought the graph defines the query, which then is executed by the scheduler. – Yuval Itzchakov Sep 12 '17 at 06:43
  • 1
    @YuvalItzchakov Sorry Yuval. Didn't mean to **bold** you :) I'm working on the answer as we speak, and suffice to say that the answer depends on the source(s) and the sink (so it's not obvious without reviewing the internals of each). In this case ConsoleSink triggers two queries by design. – Jacek Laskowski Sep 12 '17 at 07:01
  • 2
    Aha, OK. Waiting for your answer. – Yuval Itzchakov Sep 12 '17 at 07:05

0 Answers0