0

I was hoping to import an AWS Athena Database table within a spark session.

I have previously setup Notebook instances and used the pyathena library to connect to the athena table and then run Pandas dataframes. However I would like to use Pyspark for some processes but have been unable to connect to athena for this.

https://pypi.org/project/pyathena/

from pyathena import connect
from pyathena.pandas.cursor import PandasCursor

sql_query ='''

select 
*
from database.table1
where
load_stamp in ((select max(load_stamp ) from database.table1 where cast(year_month as int) IN (%(curr_month)s)))

'''


cursor = connect(work_group='p_workgroup',
               region_name= 'eu-west-1',
                 cursor_class=PandasCursor).cursor()

athena_extract= cursor.execute(sql_query,{"curr_month": curr_month}) .as_pandas()  

# print(athena_extract)

Is there a similar process to this where I can connect a spark session to an athena table on a sagemaker notebook instance? Can someone please help share the library if there is one?

Patty
  • 41
  • 1
  • 7

0 Answers0