-1

I have python script which has many SQL queries. I want to

spark.sql("Select a,b from schema.table1 UNION ALL Select a,b from schema.table2 ")

I need to extract all the table names referred in the script.

I need help on how to approach this?Can I pass the script as input file and search for matching pattern or is there any other better approach?

Wiktor Stribiżew
  • 607,720
  • 39
  • 448
  • 563
kiruba
  • 129
  • 5
  • If all the queries are simple "SELECT ... FROM TABLENAME ..." then you just need to tokenise the query (string) and output any/all tokens that follow "FROM". If your queries are more complex then you'll need to account for other keywords such as INTO – DarkKnight Jul 13 '23 at 09:40

1 Answers1

0
  1. Split the str into a list of words like ["Select", "a", "b",...]
  2. define a set which contains all the sql keywords, eg: Select, All, schema and so on.
  3. filter the list in step 1 which not in the set in step 1.
Xiaomin Wu
  • 400
  • 1
  • 5