Spark Conflicting directory structures detected

Asked Oct 01 '22 at 00:02

Active Oct 01 '22 at 00:02

Viewed 182 times

I have S3 files in the following path formats:

s3://bucket_name/src=email/year=2022/month=9/day=10/hour=1
s3://bucket_name/src=email/year=2022/month=9/day=10/hour=2
.
.
s3://bucket_name/src=sms/year=2022/month=9/day=10/hour=1
s3://bucket_name/src=sms/year=2022/month=9/day=10/hour=2
.
.

I want to read the data for 1 particular date e.g. 2022-09-10 using PySpark. I am using below code for this:

df = spark.read.parquet("s3://bucket_name/*/year=2022/month=9/day=10/")

This gives me below error:

An error occurred while calling o471.parquet.
: java.lang.AssertionError: assertion failed: Conflicting directory structures detected. Suspicious paths:
If provided paths are partition directories, please set "basePath" in the options of the data source to specify the root directory of the table. If there are multiple root directories, please load them separately and then union them.

I have tried setting basePath as well but that gives another error. Any help to read data from multiple partitions using spark?

asked Oct 01 '22 at 00:02

seou1

The error tells u what to do. – thebluephantom Oct 01 '22 at 18:54
Yeah, I have tried setting basePath as well but that gives another error. – seou1 Oct 02 '22 at 18:02

Spark Conflicting directory structures detected

0 Answers0