0

We have implemented Azure Event hub Streaming job which reads .gz files stored in Azure blob (text files compressed to gz) extracts it and performs some business logic.

Below is the code written to extract the file.

try (BlobInputStream blobInputStream = blobClient.openInputStream(); GZIPInputStream zipStream = new GZIPInputStream(blobInputStream)) {

               String inputData= org.apache.commons.io.IOUtils.toString(zipStream, "UTF-8");
            } catch (Exception ex) {
              ex.printStackTrace();
            }

We are seeing below OutOfMemoryError

2022-10-25 08:07:55  ERROR Schedulers:315 - Scheduler worker in group main failed with an uncaught exception
com.azure.messaging.eventhubs.implementation.PartitionProcessorException: Error in event processing callback
        at com.azure.messaging.eventhubs.PartitionPumpManager.processEvents(PartitionPumpManager.java:322) 
        at com.azure.messaging.eventhubs.PartitionPumpManager.lambda$startPartitionPump$2(PartitionPumpManager.java:235) 
        at reactor.core.publisher.LambdaSubscriber.onNext(LambdaSubscriber.java:160) 
        at reactor.core.publisher.FluxPublishOn$PublishOnSubscriber.runAsync(FluxPublishOn.java:440) 
        at reactor.core.publisher.FluxPublishOn$PublishOnSubscriber.run(FluxPublishOn.java:527) 
        at reactor.core.scheduler.WorkerTask.call(WorkerTask.java:84) 
        at reactor.core.scheduler.WorkerTask.call(WorkerTask.java:37) 
        at java.util.concurrent.FutureTask.run(FutureTask.java:264) 
        at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:304) 
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128) 
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628) 
        at java.lang.Thread.run(Thread.java:830) 
Caused by: com.azure.messaging.eventhubs.implementation.PartitionProcessorException: Error in event processing callback
        at com.azure.messaging.eventhubs.PartitionPumpManager.processEvent(PartitionPumpManager.java:284) 
        at com.azure.messaging.eventhubs.PartitionPumpManager.processEvents(PartitionPumpManager.java:318) 
        ... 11 more
Caused by: java.lang.OutOfMemoryError: Java heap space
        at java.util.Arrays.copyOf(Arrays.java:3746)
        at java.lang.AbstractStringBuilder.ensureCapacityInternal(AbstractStringBuilder.java:227)
        at java.lang.AbstractStringBuilder.append(AbstractStringBuilder.java:746) 
        at java.lang.StringBuilder.append(StringBuilder.java:231) 
        at org.apache.commons.io.output.StringBuilderWriter.write(StringBuilderWriter.java:143) 
        at org.apache.commons.io.IOUtils.copyLarge(IOUtils.java:1433) 
        at org.apache.commons.io.IOUtils.copyLarge(IOUtils.java:1411) 
        at org.apache.commons.io.IOUtils.copy(IOUtils.java:1208) 
        at org.apache.commons.io.IOUtils.copy(IOUtils.java:1022) 
        at org.apache.commons.io.IOUtils.toString(IOUtils.java:2839) 
        at org.apache.commons.io.IOUtils.toString(IOUtils.java:2865) 
        at com.company.azure.extractor.SACFileExtractor.extractZipFile(SACFileExtractor.java:121) 
        at com.company.azure.extractor.SACFileExtractor.processFromBlob(SACFileExtractor.java:77) 
        at com.company.azure.processor.FileExtractProcessor.processMessage(FileExtractProcessor.java:208) 
        at com.company.azure.processor.FileExtractProcessor.processMessage(FileExtractProcessor.java:90) 
        at com.company.azure.processor.FileExtractProcessor.process(FileExtractProcessor.java:47) 
        at com.company.azure.MainClass.lambda$new$0(MainClass.java:225) 
        at com.company.azure.MainClass$$Lambda$107/0x0000000800c1e440.accept(Unknown Source) 
        at com.azure.messaging.eventhubs.EventProcessorClientBuilder$1.processEvent(EventProcessorClientBuilder.java:595) 
        at com.azure.messaging.eventhubs.PartitionPumpManager.processEvent(PartitionPumpManager.java:274) 
        at com.azure.messaging.eventhubs.PartitionPumpManager.processEvents(PartitionPumpManager.java:318) 
        at com.azure.messaging.eventhubs.PartitionPumpManager.lambda$startPartitionPump$2(PartitionPumpManager.java:235) 
        at com.azure.messaging.eventhubs.PartitionPumpManager$$Lambda$840/0x0000000801150c40.accept(Unknown Source) 
        at reactor.core.publisher.LambdaSubscriber.onNext(LambdaSubscriber.java:160) 
        at reactor.core.publisher.FluxPublishOn$PublishOnSubscriber.runAsync(FluxPublishOn.java:440) 
        at reactor.core.publisher.FluxPublishOn$PublishOnSubscriber.run(FluxPublishOn.java:527) 
        at reactor.core.scheduler.WorkerTask.call(WorkerTask.java:84) 
        at reactor.core.scheduler.WorkerTask.call(WorkerTask.java:37) 
        at java.util.concurrent.FutureTask.run(FutureTask.java:264) 
        at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:304) 
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128) 
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628) 

Is this issue because there is not enough heap space? or Because file size is big and not able to store it to a String variable as per below code?

 String inputData= org.apache.commons.io.IOUtils.toString(zipStream, "UTF-8")

Please suggest how to handle this issue.

chandu ram
  • 251
  • 2
  • 5
  • 19
  • 1
    Yes, you're reading a file that is larger than available memory. You need to either increase the available memory or read the stream you have a bit at a time. – stdunbar Nov 15 '22 at 15:01

0 Answers0