I have a huge pyspark dataframe 800k rows. I tried to collect only 1 cell of a column but it failed. I am running my code on EMR service. It looks that is a memory leak problem.
print(df.collect()[0][1])
I get this error:
An error occurred while calling o199.collectToPython. : org.apache.spark.SparkException: Job aborted due to stage failure: Total size of serialized results of 303 tasks (1026.3 MiB) is bigger than spark.driver.maxResultSize (1024.0 MiB)...