We have implemented Azure Event hub Streaming job which reads .gz files stored in Azure blob (text files compressed to gz) extracts it and performs some business logic.
Below is the code written to extract the file.
try (BlobInputStream blobInputStream = blobClient.openInputStream(); GZIPInputStream zipStream = new GZIPInputStream(blobInputStream)) {
String inputData= org.apache.commons.io.IOUtils.toString(zipStream, "UTF-8");
} catch (Exception ex) {
ex.printStackTrace();
}
We are seeing below OutOfMemoryError
2022-10-25 08:07:55 ERROR Schedulers:315 - Scheduler worker in group main failed with an uncaught exception
com.azure.messaging.eventhubs.implementation.PartitionProcessorException: Error in event processing callback
at com.azure.messaging.eventhubs.PartitionPumpManager.processEvents(PartitionPumpManager.java:322)
at com.azure.messaging.eventhubs.PartitionPumpManager.lambda$startPartitionPump$2(PartitionPumpManager.java:235)
at reactor.core.publisher.LambdaSubscriber.onNext(LambdaSubscriber.java:160)
at reactor.core.publisher.FluxPublishOn$PublishOnSubscriber.runAsync(FluxPublishOn.java:440)
at reactor.core.publisher.FluxPublishOn$PublishOnSubscriber.run(FluxPublishOn.java:527)
at reactor.core.scheduler.WorkerTask.call(WorkerTask.java:84)
at reactor.core.scheduler.WorkerTask.call(WorkerTask.java:37)
at java.util.concurrent.FutureTask.run(FutureTask.java:264)
at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:304)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
at java.lang.Thread.run(Thread.java:830)
Caused by: com.azure.messaging.eventhubs.implementation.PartitionProcessorException: Error in event processing callback
at com.azure.messaging.eventhubs.PartitionPumpManager.processEvent(PartitionPumpManager.java:284)
at com.azure.messaging.eventhubs.PartitionPumpManager.processEvents(PartitionPumpManager.java:318)
... 11 more
Caused by: java.lang.OutOfMemoryError: Java heap space
at java.util.Arrays.copyOf(Arrays.java:3746)
at java.lang.AbstractStringBuilder.ensureCapacityInternal(AbstractStringBuilder.java:227)
at java.lang.AbstractStringBuilder.append(AbstractStringBuilder.java:746)
at java.lang.StringBuilder.append(StringBuilder.java:231)
at org.apache.commons.io.output.StringBuilderWriter.write(StringBuilderWriter.java:143)
at org.apache.commons.io.IOUtils.copyLarge(IOUtils.java:1433)
at org.apache.commons.io.IOUtils.copyLarge(IOUtils.java:1411)
at org.apache.commons.io.IOUtils.copy(IOUtils.java:1208)
at org.apache.commons.io.IOUtils.copy(IOUtils.java:1022)
at org.apache.commons.io.IOUtils.toString(IOUtils.java:2839)
at org.apache.commons.io.IOUtils.toString(IOUtils.java:2865)
at com.company.azure.extractor.SACFileExtractor.extractZipFile(SACFileExtractor.java:121)
at com.company.azure.extractor.SACFileExtractor.processFromBlob(SACFileExtractor.java:77)
at com.company.azure.processor.FileExtractProcessor.processMessage(FileExtractProcessor.java:208)
at com.company.azure.processor.FileExtractProcessor.processMessage(FileExtractProcessor.java:90)
at com.company.azure.processor.FileExtractProcessor.process(FileExtractProcessor.java:47)
at com.company.azure.MainClass.lambda$new$0(MainClass.java:225)
at com.company.azure.MainClass$$Lambda$107/0x0000000800c1e440.accept(Unknown Source)
at com.azure.messaging.eventhubs.EventProcessorClientBuilder$1.processEvent(EventProcessorClientBuilder.java:595)
at com.azure.messaging.eventhubs.PartitionPumpManager.processEvent(PartitionPumpManager.java:274)
at com.azure.messaging.eventhubs.PartitionPumpManager.processEvents(PartitionPumpManager.java:318)
at com.azure.messaging.eventhubs.PartitionPumpManager.lambda$startPartitionPump$2(PartitionPumpManager.java:235)
at com.azure.messaging.eventhubs.PartitionPumpManager$$Lambda$840/0x0000000801150c40.accept(Unknown Source)
at reactor.core.publisher.LambdaSubscriber.onNext(LambdaSubscriber.java:160)
at reactor.core.publisher.FluxPublishOn$PublishOnSubscriber.runAsync(FluxPublishOn.java:440)
at reactor.core.publisher.FluxPublishOn$PublishOnSubscriber.run(FluxPublishOn.java:527)
at reactor.core.scheduler.WorkerTask.call(WorkerTask.java:84)
at reactor.core.scheduler.WorkerTask.call(WorkerTask.java:37)
at java.util.concurrent.FutureTask.run(FutureTask.java:264)
at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:304)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
Is this issue because there is not enough heap space? or Because file size is big and not able to store it to a String variable as per below code?
String inputData= org.apache.commons.io.IOUtils.toString(zipStream, "UTF-8")
Please suggest how to handle this issue.