0

I want to get table names in spark sql query .

Sql looks like "select * from table_a as A join table_b as b where a.pk=b.fk ". And I want a list contains table_a and table_b.

Before this question,I found scala code How to get table names from SQL query? tried to translate it to java ,but failed.

Any help would be appreciated.

Ramesh Maharjan
  • 41,071
  • 6
  • 69
  • 97
moyiguke
  • 70
  • 1
  • 10
  • I dont know why it is marked duplicate. This question is different from the one stated as duplicate. – Constantine Aug 29 '18 at 07:35
  • I think it is different from "https://stackoverflow.com/questions/42880119/how-to-list-all-tables-in-database-using-spark-sql". I was puzzled about how to get table names from an existing SQL query,not from a database.Should you clean the duplicate mark? – moyiguke Aug 29 '18 at 07:38
  • @user8371915 Can you review this question again ? – Constantine Aug 29 '18 at 11:25

1 Answers1

0
public static Set<String> getRelations(Dataset<org.apache.spark.sql.Row> dataframe){
    LogicalPlan plan = dataframe.queryExecution().analyzed();

    return scala.collection.JavaConverters.seqAsJavaListConverter(plan.collectLeaves()).asJava()
    .stream()
    .map(logicalPlan -> {

        if(logicalPlan instanceof CatalogRelation){
            CatalogRelation catalogRelation = (CatalogRelation) logicalPlan;
            return catalogRelation.tableMeta().database() + "." + catalogRelation.tableMeta().identifier().table();
        }
        if(logicalPlan instanceof LogicalRelation){
            LogicalRelation logicalRelation = (LogicalRelation) logicalPlan;
            return logicalRelation.catalogTable().get().database() + "." + logicalRelation.catalogTable().get().identifier().table();
        }
        return "";
    }).collect(Collectors.toSet());
}

This is the getRelations() function to retrieve the relations from your Dataframe.

Dataset<org.apache.spark.sql.Row> df = spark.sql("select * from table_a as A join table_b as b where a.pk=b.fk");

This would give you a dataset which you can pass to getRelations() to get your relations in the query.

Constantine
  • 1,356
  • 14
  • 19
  • Thanks for replying.The code works well,and I did some modification to fit my project .First ,I get LogicalPlan using "sparkSession.sessionState().sqlParser().parsePlan(sql)" . Second , I add one case in getRelatoins method "if (logicalPlan instanceof UnresolvedRelation)". – moyiguke Aug 29 '18 at 07:29
  • Yes it has other cases as well apart from the UnresolvedRelation like OneRowRelation. This block of code is a part of my spark plan parser which I wrote in scala :) – Constantine Aug 29 '18 at 07:32