Databricks: reading data with .snappy.parquet extension

Question

I have a table with .snappy.parquet extension.

data= 'part-001-36b4-7ea3-4165-8742-2f32d8643d-c000.snappy.parquet'

I would like to read this and I tried the following:

table = spark.read.load(data, format='delta')

When I try with the above syntaxy, I am getting the following error. AnalysisException: A partition path fragment should be the form like `part1=foo/part2=bar`. The partition path: part-001-36b4-7ea3-4165-8742-2f32d8643d-c000.snappy.parquet.

and

table = spark.read.parquet(data)

When I try with the above, I am getting this error: AnalysisException: Incompatible format detected.

score 1 · Answer 1 · answered May 17 '23 at 14:38

1

df = spark.read.parquet('/path/where/file/is/')

Probably your parquet is generated with many parts, so you need to read all the path where parquet parts are generated

answered May 17 '23 at 14:38

Rodrigo Cristiano

11
3

score 0 · Answer 2 · answered Sep 13 '22 at 14:07

0

If you don't mind using pandas for this specific task, I've found success in the past reading snappy parquet files like this

import pandas as pd
df = pd.read_parquet(data)

answered Sep 13 '22 at 14:07

Parker Watson

81
6

Databricks: reading data with .snappy.parquet extension

2 Answers2