Catalyst optimizer makes use of standard features of Scala programming like pattern matching. In the depth, Catalyst contains the tree and the set of rules to manipulate the tree. There are specific libraries to process relational queries. There are various rule sets which handle different phases of query execution like analysis, query optimization, physical planning, and code generation to compile parts of queries to Java bytecode.
Questions tagged [catalyst-optimizer]
27 questions
1
vote
1 answer
Spark internals: benefits of Project
I've read this question in which the OP tried to convert this logical plan:
Aggregate [sum(inc(vals#4L)) AS sum(inc(vals))#7L]
+- LocalRelation [vals#4L]
To this:
Aggregate [sum(inc_val#6L) AS sum(inc(vals))#7L]
+- Project [inc(vals#4L) AS…

Alon
- 10,381
- 23
- 88
- 152
1
vote
2 answers
spark register expression for SQL DSL
How can I access a catalyst expression (not regular UDF) in spark SQL scala DSL API?
http://geospark.datasyslab.org only allows for text based execution
GeoSparkSQLRegistrator.registerAll(sparkSession)
var stringDf = sparkSession.sql(
"""
…

Georg Heiler
- 16,916
- 36
- 162
- 292
0
votes
1 answer
Does Spark SQL optimize lower() on both sides?
Say I have this pseudo code in Spark SQL where t1 is a temp view built off of partitioned parquet files in HDFS and t2 is a small lookup file to filter the said temp view
select t1.*
from t1
where exists(select *
from t2
…

Radagast
- 5,102
- 3
- 12
- 27
0
votes
1 answer
Export a spark logical/physical plan?
Can one export a Spark logical or physical plan of a dataframe/set, serialize it and save it somewhere (as text, xml, json ...). Then re-import it, and create a dataframe based on it ?
The idea here is, I'm interested in having a metastore for Spark…

Hamza EL KAROUI
- 69
- 7
0
votes
1 answer
Long linear queries in Spark against a graph stored in Hive tables
Suppose I have a graph G and the following query:
x y z w q r s
(?a)--(?b)--(?c)--(?d)--(?e)--(?f)--(?g)--(?h)
where {?a, ?b, ?c, ..., ?h} are variables, and {x, y, z, w, q, r, s} are arc labels.
At the storage level I…

Anthony Arrascue
- 220
- 1
- 2
- 13
0
votes
1 answer
What happened to the ability to visualize query plans in a Databricks notebook?
There is an old (year 2014) talk on Youtube where the speaker visualized a query plan right inside a Databricks notebook. Here is the screenshot:
I am using databricks runtime 5.5 LTS ML and whenever I try to call viz on a query plan, I get this…

mauna
- 1,098
- 13
- 25
0
votes
1 answer
Spark optimize "DataFrame.explain" / Catalyst
I've got a complex software which performs really complex SQL queries (well not queries, Spark plans you know). <-- The plans are dynamic, they change based on user input so I can't "cache" them.
I've got a phase in which spark takes 1.5-2min…

BiS
- 501
- 4
- 17
0
votes
1 answer
Is a select after casting a data frame to dataset optimized?
I have the following scenario:
case class A(name:String,age:Int)
val df = List(A("s",2)).toDF
df.write.parquet("filePath")
val result = spark.read.parquet("filePath").as[A].select("age")
Is the above optimized to select only age ? Upon seeing…

advocateofnone
- 2,527
- 3
- 17
- 39
0
votes
1 answer
Spark DataFrame how to preserve sorting and partitioning information after mapPartitions
I use DataFrame mapPartitions in a library which is loosely implementation of the Uber Case Study. The output DataFrame has some new (large) columns, and the input DataFrame is partitioned and internally sorted before doing mapPartitions. Most…

shay__
- 3,815
- 17
- 34
0
votes
0 answers
Prevent spark catalyst from optimizing and moving dynamic parallelism
I need to dynamically set spark.sql.shuffle.partitions during the execution of my spark job.
Initially, it is set when starting the job, but then after various aggregations, I need to decrease it over and over again.
However, catalyst tends to push…

Georg Heiler
- 16,916
- 36
- 162
- 292
0
votes
1 answer
Query Cassandra from Spark using CassandraSQLContext
I try to query Cassandra from Spark using CassandraSQLContext, but I get an weird missing dependency error. I have a Spark application like the following :
val spark: SparkSession = SparkSession.builder().appName(appName).getOrCreate()
val…

belgacea
- 1,084
- 1
- 15
- 33
-1
votes
2 answers
How can one use spark Catalyst?
According to this
Spark Catalyst is An implementation-agnostic framework for manipulating trees of relational operators and expressions.
I want to use Spark Catalyst to parse SQL DMLs and DDLs to write and generate custom Scala code for. However,…

justin
- 99
- 9