I am trying to build a sql parser that will be parsing complicated sql queries
for example, below is an example of one of such queries:
select * from (select a.a, a.b from db.tmp_table1 a JOIN db.tmp_table2 b
where a.c = b.c
and a.d = 'interesting'
and b.e = 234)
the output that we will be having is
(schema = db, table: tmp_table1, cols used = a,b,c,d filters=(d = 'interesting'))
(schema = db, table: tmp_table2, cols used = c, e filters=(e = 234))
I tried running some experiments by doing this:
val logicalPlan = spark.sessionState.sqlParser.parsePlan(query)
logicalPlan: org.apache.spark.sql.catalyst.plans.logical.LogicalPlan =
'Project [*]
+- 'SubqueryAlias `__auto_generated_subquery_name`
+- 'Project ['a.a, 'a.b]
+- 'Filter ((('a.c = 'b.c) && ('a.d = interesting)) && ('b.e = 234))
+- 'Join Inner
:- 'SubqueryAlias `a`
: +- 'UnresolvedRelation `db`.`tmp_table1`
+- 'SubqueryAlias `b`
+- 'UnresolvedRelation `db`.`tmp_table2`
However the part where I am stuck on is how do I parse the logicalPlan to get my desired outputs. I checked some answers in stack overflow related to such parsing like this but the requirements in such questions seems to be a lot simpler.