I'm Newbie in AWS Glue and Spark. I build my ETL in this. When connect my s3 with files of 200mb approximately not read this. The error is that
An error was encountered:
An error occurred while calling o99.toDF.
: org.apache.spark.SparkException: Job aborted due to stage failure: Task 1 in stage 10.0 failed 1 times, most recent failure: Lost task 1.0 in stage 10.0 (TID 16) (91ec547edca7 executor driver): com.amazonaws.services.glue.util.NonFatalException: Record larger than the Split size: 67108864
Update 1: When split my json file(200mb) with jq, in two parts AWS GLUE, read with normally both parts
My solution is a lambda splitting file, but i want to know how aws glue split works Thanks and Regards