0

Having an issue with loading parquet files onto Databricks. We have used Amazon DMS service to migrate postgres databases onto Databricks in order to save them on the Delta lake. The DMS moved the database from RDS postgres into an S3 bucket that is already mounted. Files are visible but I am unable to read them.

Running:

df = spark.read.option("header","true").option("recursiveFileLookup","true").format('delta').load('/mnt/delta/postgres_table')

display(df)

Show: Query returned no results

Inside this directory there are a slew of snappy.parquet.

Thank you

I downloaded and reviewed as an individual parquet file as a LOAD0000.parquet file and it does show with pandas. Aside from that several scripts were tested to see if I can get 1 df to show to no avail.

  • do you have `_delta_log` directory inside your directory with data? – Alex Ott Oct 28 '22 at 17:59
  • Have your tried loading that without any options? Like this: `df = spark.read.format('delta').load('/mnt/delta/postgres_table')` On the side - you don't need the `option("header","true")` for Delta/Parquet formats - the header information is already encoded into these formats, so it will be read regardless of this option. – Bartosz Gajda Oct 29 '22 at 20:05
  • Yes @AlexOtt there is a _delta_log in the postgres_table directory. – Edward Plata Oct 31 '22 at 18:03
  • Hi @BartoszGajda, I tried without the option before, it still results as an empty dataframe. – Edward Plata Oct 31 '22 at 18:04

0 Answers0