How to read DeltaLake table using Pyspark

Question

I have a deltalake table ( parquet format) in AWS S3 bucket. I need to read it in a dataframe using Pyspark in notebook code. I tried searching online but no success yet. Can anyone share sample code of how to read a deltalake table in Pyspark ( dataframe or any other object).

Eren Sakarya · Answer 1 · 2023-05-05T14:09:04.347

0

If you have already created the Delta Table, then you can read as Spark dataframe like below;

s3_path = "s3://<bucket_name>/<delta_tables_path>/"
df = spark_session.read.load(s3_path)
df.show(n)

edited May 05 '23 at 14:09

answered May 05 '23 at 13:44

Eren Sakarya

150
8

Thanks @Eren. In this case how do I authenticate to S3 bucket? where do I add the credentials? – PythonDeveloper May 05 '23 at 13:57
I didn't understand what you meant. You must already download the AWS CLI from this link; https://docs.aws.amazon.com/cli/latest/userguide/getting-started-install.html and set your AWS profile like here; https://docs.aws.amazon.com/toolkit-for-visual-studio/latest/user-guide/keys-profiles-credentials.html. After all, you should be reading it smoothly. – Eren Sakarya May 05 '23 at 14:12

How to read DeltaLake table using Pyspark

1 Answers1