1

Using SAS, I commonly use proc sql to create a dataset reading from a database. Later I can again use proc sql to query both from the database and from the first local dataset i created.

How does one do this in Python (using Pandas)?

Say in SAS, I run:

proc sql;
create table work.dataset1 as
select * from dbtable1;
run;

And then I can use this dataset to keep querying my database, as:

proc sql;
create table work.dataset2 as
select a.*, b.* from work.dataset1 a, dbtable2 b;
run;

I Python I have this code:

df1 = pd.read_sql_query("select * from dbtable1", conn)

And would like to be able to reference df1 in later queries, like:

df2 = pd.read_sql_query("select a.*, b.* from df1 a, dbtable2 b", conn)

But this does not seem to work.

Does anyone know how this can be done using Python/Pandas?

Many thanks in advance!

eshy
  • 57
  • 8

1 Answers1

1

pd.read_sql_query will not be able to do this. You could first run a SQL query on your database to create a dataframe object from dbtable2, and use pandas methods to run queries on your multiple dataframes.

There is also a Python library called pandasql that allows you to query pandas dataframes using SQL syntax. So in your case, it would be:

import pandas as pd
import pandasql as ps

df1 = pd.read_sql_query("select * from dbtable1", conn)
df2 = pd.read_sql_query("select * from dbtable2", conn)

pysqldf = lambda q: ps.sqldf(q, globals())
q = """select a.*, b.* from df1 a, df2 b;"""
final_df = pysqldf(q)
AlexK
  • 2,855
  • 9
  • 16
  • 27