At this scale of data, there's a couple of things you can influence about your build to make it more appropriately optimized.
Ensure your Code Workbook / Code Repository is using AQE
It's worth verifying your Build is running using AQE as noted over here. This will ensure your stages don't split up their work into 200 tasks (way too many for this scale, tasks sized in the KB range will suffer from too much network I/O).
The default sizes for tasks is probably fine for your job, so don't modify the advisory partition sizes unless proven otherwise.
Consider using Local mode
Since your data scale is small enough, you might consider using what's called Spark "Local Mode". This is when you don't use any Executors to do your work and instead hold the entire contents of your job inside the Driver itself. This means you don't move data across the cluster to perform join
s, window
s, groupBy
s etc, but instead you keep it all in memory on the Driver's host. This only works so long as all your data can indeed fit into memory, but for small scales where this is true, it means your data is substantially faster to access and use.
In Code Repositories, you would apply KUBERNETES_NO_EXECUTORS
to your transform, in Code Workbooks you'll want to reach out to your Palantir support engineers to configure this behavior.
What you'll then see is your transform having zero executors assigned to it, but still some tasks running in parallel. They will all just be running in parallel on your Driver using each core the Driver has. NOTE: be very careful not to boost the number of cores too high otherwise you will increase your risk of OOM per guidance here. Essentially, the fractional share of memory per core as you increase core counts actually decreases, which will increase the risk of an individual task OOMing. You also don't want to subscribe too many cores to the Driver for better 'parallelism' because you likely should consider using the standard Executor-based compute setup if you go much beyond 4 parallel tasks.
Since you now are using only resources on your Driver, you may need to boost the number of cores to support the max number of tasks that are running. In a typical setup, this is 4, so you would apply DRIVER_CORES_LARGE
in Code Repositories, and similarly would reach out to your Palantir Support for configuration in a Code Workbook.
As an additional commentary, it's worth highlighting that Spark itself goes through a query planning process using the Catalyst engine whereby optimizations are made for your job to do the least amount of work possible when building the output. These optimization take time to perform, which means you may observe time being spent planning your query that exceeds the actual execution of your query. In scales above ~1GB of input size, this is a feature; In the scale of this example, it means your performance is slightly worse than a simpler system. In the case your data scale increases, however, this optimization step is crucial to maintain scalability and performance.