perhaps nothing new but we have a use case to profile a spark application that runs in a distributed fashion
We currently use the async-profiler that monitors each executor ( a process in spark ) and generates a JFR per process. It's a little annoying to look at individual executor profiles and make any sense/compare
We are using the JFR assemble to combine all the JFR's produced. Curious, is this how distributed profiling done?
/async-profiler/profiler.sh collect -e cpu -d 120 -i 20ms -o jfr -f ${file} ${pid}
This is run periodically every 120 seconds thus creating a continuous mode profiling
The benchmark we are running is to run a job in a cluster of EC2 2xl vs EC2 4xl and what we are noticing is that on 4xl our jobs are running slower. The 2xl cluster has twice the number of machines as 4xl
Each process uses 8cores, 54gb heap. On 2xl, each machine runs a single process but on 4xl, we run 2 process per machine without any isolation
Any leads on how to debug this is appreciated. Let us know if I need to add any more options to the async-profiler. We clearly see more time spent on CPU hence the -e CPU