7

I am using Databricks and I already have loaded some DataTables.

However, I have a complex SQL query that I want to operate on these data tables, and I wonder if i could avoid translating it in pyspark.

Is that possible?

To give an example:

In SQL:

with table2 as (
   select column1, column1
   from database.table1
   where
         start_date <= DATE '2019-03-01' and
         end_date >= DATE '2019-03-31'            )

In pyspark I would already have table1 loaded but the following does not work because it can not find table1.

query = "(
    select column1, column1
           from table1
               where
         start_date <= DATE '2019-03-01' and
         end_date >= DATE '2019-03-31'            )"
table2 = spark.sql(query)

Thanks

marc_s
  • 732,580
  • 175
  • 1,330
  • 1,459
George Sotiropoulos
  • 1,864
  • 1
  • 22
  • 32

1 Answers1

8

Try giving databasename.tablename instead of tablename in query.

query = "(
    select column1, column1
           from *database_name.table_name* 
               where
         start_date <= DATE '2019-03-01' and
         end_date >= DATE '2019-03-31' )" 

If you are using pyspark then it must be

spark.sql(query)
  • 2
    does not work, the query is not even valid python string. – jdhao Oct 02 '22 at 08:34
  • This doesn't answer the question. OP said to assume that a DataFrame `table1` exists in the Python session. Your response just queries a database table. – ardaar Jun 15 '23 at 18:12