We have a java spark streaming application which does scd2 operation on deltalake.
We were using spark 3.0.0 and delta lake 0.7.0 after upgrading to Spark 3.2.0 and delta 1.1.0, we can see the following exception (under load of 100K events)
Caused by: java.lang.ClassCastException: [C cannot be cast to [J, at org.apache.spark.unsafe.memory.HeapMemoryAllocator.allocate(HeapMemoryAllocator.java:58) , at org.apache.spark.memory.TaskMemoryManager.allocatePage(TaskMemoryManager.java:314)
We are currently in development hence running in local mode, for 100K test with
- spark master - local[4]
- 4GB driver memory.
- Shuffle partition 25
- Kafka topic partition 25
- micro batch size 5000
- trigger 120 sec
Another observation is that , this exception does not occur after the first couple of batches and for the subsequent 100K load it works fine if we don't restart.
Additional information regarding the load. The payload is same for all the events, we use different keys for 100K events. And the exception is seen only for the first couple of batches (may be 2-3 other batches works fine)
Any guidelines regarding tuning or fix that can help in resolving this exception would help.
Thanks.