Questions tagged [catalyst-optimizer]

Catalyst optimizer makes use of standard features of Scala programming like pattern matching. In the depth, Catalyst contains the tree and the set of rules to manipulate the tree. There are specific libraries to process relational queries. There are various rule sets which handle different phases of query execution like analysis, query optimization, physical planning, and code generation to compile parts of queries to Java bytecode.

27 questions
1
vote
1 answer

Spark internals: benefits of Project

I've read this question in which the OP tried to convert this logical plan: Aggregate [sum(inc(vals#4L)) AS sum(inc(vals))#7L] +- LocalRelation [vals#4L] To this: Aggregate [sum(inc_val#6L) AS sum(inc(vals))#7L] +- Project [inc(vals#4L) AS…
Alon
  • 10,381
  • 23
  • 88
  • 152
1
vote
2 answers

spark register expression for SQL DSL

How can I access a catalyst expression (not regular UDF) in spark SQL scala DSL API? http://geospark.datasyslab.org only allows for text based execution GeoSparkSQLRegistrator.registerAll(sparkSession) var stringDf = sparkSession.sql( """ …
Georg Heiler
  • 16,916
  • 36
  • 162
  • 292
0
votes
1 answer

Does Spark SQL optimize lower() on both sides?

Say I have this pseudo code in Spark SQL where t1 is a temp view built off of partitioned parquet files in HDFS and t2 is a small lookup file to filter the said temp view select t1.* from t1 where exists(select * from t2 …
Radagast
  • 5,102
  • 3
  • 12
  • 27
0
votes
1 answer

Export a spark logical/physical plan?

Can one export a Spark logical or physical plan of a dataframe/set, serialize it and save it somewhere (as text, xml, json ...). Then re-import it, and create a dataframe based on it ? The idea here is, I'm interested in having a metastore for Spark…
0
votes
1 answer

Long linear queries in Spark against a graph stored in Hive tables

Suppose I have a graph G and the following query: x y z w q r s (?a)--(?b)--(?c)--(?d)--(?e)--(?f)--(?g)--(?h) where {?a, ?b, ?c, ..., ?h} are variables, and {x, y, z, w, q, r, s} are arc labels. At the storage level I…
0
votes
1 answer

What happened to the ability to visualize query plans in a Databricks notebook?

There is an old (year 2014) talk on Youtube where the speaker visualized a query plan right inside a Databricks notebook. Here is the screenshot: I am using databricks runtime 5.5 LTS ML and whenever I try to call viz on a query plan, I get this…
mauna
  • 1,098
  • 13
  • 25
0
votes
1 answer

Spark optimize "DataFrame.explain" / Catalyst

I've got a complex software which performs really complex SQL queries (well not queries, Spark plans you know). <-- The plans are dynamic, they change based on user input so I can't "cache" them. I've got a phase in which spark takes 1.5-2min…
BiS
  • 501
  • 4
  • 17
0
votes
1 answer

Is a select after casting a data frame to dataset optimized?

I have the following scenario: case class A(name:String,age:Int) val df = List(A("s",2)).toDF df.write.parquet("filePath") val result = spark.read.parquet("filePath").as[A].select("age") Is the above optimized to select only age ? Upon seeing…
0
votes
1 answer

Spark DataFrame how to preserve sorting and partitioning information after mapPartitions

I use DataFrame mapPartitions in a library which is loosely implementation of the Uber Case Study. The output DataFrame has some new (large) columns, and the input DataFrame is partitioned and internally sorted before doing mapPartitions. Most…
shay__
  • 3,815
  • 17
  • 34
0
votes
0 answers

Prevent spark catalyst from optimizing and moving dynamic parallelism

I need to dynamically set spark.sql.shuffle.partitions during the execution of my spark job. Initially, it is set when starting the job, but then after various aggregations, I need to decrease it over and over again. However, catalyst tends to push…
0
votes
1 answer

Query Cassandra from Spark using CassandraSQLContext

I try to query Cassandra from Spark using CassandraSQLContext, but I get an weird missing dependency error. I have a Spark application like the following : val spark: SparkSession = SparkSession.builder().appName(appName).getOrCreate() val…
belgacea
  • 1,084
  • 1
  • 15
  • 33
-1
votes
2 answers

How can one use spark Catalyst?

According to this Spark Catalyst is An implementation-agnostic framework for manipulating trees of relational operators and expressions. I want to use Spark Catalyst to parse SQL DMLs and DDLs to write and generate custom Scala code for. However,…
1
2