I have a deltalake table ( parquet format) in AWS S3 bucket. I need to read it in a dataframe using Pyspark in notebook code. I tried searching online but no success yet. Can anyone share sample code of how to read a deltalake table in Pyspark ( dataframe or any other object).
Asked
Active
Viewed 401 times
1 Answers
0
If you have already created the Delta Table, then you can read as Spark dataframe like below;
s3_path = "s3://<bucket_name>/<delta_tables_path>/"
df = spark_session.read.load(s3_path)
df.show(n)

Eren Sakarya
- 150
- 8
-
Thanks @Eren. In this case how do I authenticate to S3 bucket? where do I add the credentials? – PythonDeveloper May 05 '23 at 13:57
-
I didn't understand what you meant. You must already download the AWS CLI from this link; https://docs.aws.amazon.com/cli/latest/userguide/getting-started-install.html and set your AWS profile like here; https://docs.aws.amazon.com/toolkit-for-visual-studio/latest/user-guide/keys-profiles-credentials.html. After all, you should be reading it smoothly. – Eren Sakarya May 05 '23 at 14:12