I was hoping to import an AWS Athena Database table within a spark session.
I have previously setup Notebook instances and used the pyathena library to connect to the athena table and then run Pandas dataframes. However I would like to use Pyspark for some processes but have been unable to connect to athena for this.
https://pypi.org/project/pyathena/
from pyathena import connect
from pyathena.pandas.cursor import PandasCursor
sql_query ='''
select
*
from database.table1
where
load_stamp in ((select max(load_stamp ) from database.table1 where cast(year_month as int) IN (%(curr_month)s)))
'''
cursor = connect(work_group='p_workgroup',
region_name= 'eu-west-1',
cursor_class=PandasCursor).cursor()
athena_extract= cursor.execute(sql_query,{"curr_month": curr_month}) .as_pandas()
# print(athena_extract)
Is there a similar process to this where I can connect a spark session to an athena table on a sagemaker notebook instance? Can someone please help share the library if there is one?