0

I create a dataframe in a Jupyter Notebook instance (AWS Sagemaker) by connecting to an AWS Athena table using a SQL connection like the example below:

I have made a paramter per the link (https://pypi.org/project/pyathena/#query-with-parameter). The parameter is allowing me to insert a single month into the query but fails if I try multiple months - can someone please assist me with how you can insert multiple months into the dictionary :

from pandasql import sqldf
from pyathena import connect
from pyathena.pandas.cursor import PandasCursor

curr_month = 202206
prev_month = 202203

conn = connect(work_group='prod_user',
               region_name= 'eu-west-1')

sql_query = ('''
select 
year_month,
id,
field_1,
field_2,
field_3
from
mart.table_xyz
WHERE
year_month = (%(param)s)
;
''')

cursor = connect(work_group='prod_user',
               region_name= 'eu-west-1',
                 cursor_class=PandasCursor).cursor()
user_df = cursor.execute(sql_query,{"param": curr_month}).as_pandas()

I have tried a list like below to return two months but that causes an error. Any help would be appreciated

cursor = connect(work_group='prod_user',
               region_name= 'eu-west-1',
                 cursor_class=PandasCursor).cursor()
user_df = cursor.execute(sql_query,{"param": [curr_month,prev_month]}).as_pandas()

Patty
  • 41
  • 1
  • 7

1 Answers1

0

Try changing the query to use the in operator (not sure about actual python code, but the sql query should look something like where year_month in (202206, 202203)):

WHERE year_month in (%(param)s)

And param can be something like {"param": ', '.join([curr_month, prev_month])}

Guru Stron
  • 102,774
  • 10
  • 95
  • 132
  • I will have a look into this. Initially it's not looking as the dates are bigint's for that field. It gives a type error so I guess I just convert it. TypeError: sequence item 0: expected str instance, int found – Patty Mar 08 '23 at 18:49
  • @Patty try `', '.join(map(str, [curr_month, prev_month]))` for param – Guru Stron Mar 08 '23 at 18:55
  • Thanks it actually came back to the original error, which is why I originally questioned if you can run multiple dates in the parameter - OperationalError: SYNTAX_ERROR: line 27:16: IN value and list items must be the same type: bigint. I will find information around this error – Patty Mar 08 '23 at 19:01
  • @Patty the thing is that _"the original error"_ is not specified in the question. try changing query a bit - `where cast(year_month as int) in ...` – Guru Stron Mar 08 '23 at 19:22