0

We started receiving this generic today-

Caused by: org.apache.spark.SparkException: Job aborted due to stage failure: java.io.EOFException

Saw some articles talking about this being from big files, missing libraries, or memory constraints.

https://datascience.stackexchange.com/questions/40130/pyspark-java-io-eofexception

PySpark throws java.io.EOFException when reading big files with boto3

DetroitMike
  • 124
  • 1
  • 1
  • 6

1 Answers1

0

For us it ended up being an empty .seq file that was written by one of our ETL tools. Removing that invalid file resolved the issue for us.

DetroitMike
  • 124
  • 1
  • 1
  • 6