I'm been struggling with a issue non existent some days ago , Spark performance is very bad compared to some days ago (execution time exploded from minutes to hours , same code, same source data, same configs), by looking at logs and spark WEB UI i see lots of :
- futures timed out
- locality mostly rack local (vs mostly node local as some days ago)
- Tried to get loss reason for non-existent executor
- Many failed tasks in Executors tab.
Something interesting is that sessions going through livy seem to behave better than sessions that go directly through YARN.
What are the possible reasons for this behavior?