0

I have a memory-intensive exporting function that needs to retrieve over 1M data from the MongoDB and write it to an Excel file. This export method is already triggered asynchronously by using AWS SNS/SQS. However, it always encounters an OOM issue without using the SingleThreadPool.

@Transactional
@SqsListener("${sqs-example}")
public void exportData(String json) throws IOException {
    LOGGER.info("Export: {}", json);
    ExportPayload exportPayload = objectMapper.readValue(json, ExportPayload.class);
    try {
        TenantContext.setTenant(exportPayload.getTenant());
        exportService.export(exportPayload.getExportId(), exportPayload.getTenant());
    } finally {
        TenantContext.clear();
    }
}

The exportFunc function is the one that writes the retrieved MongoDB data to an excel file. Without the singleThreadExectuer, it will have an OOM issue after around 130,000 rows. But if writing it as below, it could handle over 1M rows easily.

public void export(String exportId, Long tenantId) {
        ExecutorService es = Executors.newSingleThreadExecutor();
        if (oExport.isPresent()) {
            try {
                es.submit(() -> {
                    try {
                        TenantContext.setTenant(tenantId);
                        exportFunc();
                    } finally {
                        TenantContext.clear();
                    }
                });
            } finally {
                es.shutdown();
            }

        } else {
            LOGGER.error("No export found by id: {}", exportId);
        }
    }

In my opinion, the export will be triggered when the program receives a message, and it's already running in a separate thread, why does the singleThreadExecutor can solve the OOM issue in this case?

Coding_Rabbit
  • 1,287
  • 3
  • 22
  • 44
  • 2
    Are you sure it does? Or does it simply swallow the OOM and never report it? – RealSkeptic Mar 13 '23 at 16:03
  • 1
    Creating a new `ExecutorService` for every job, is not better than creating a new `Thread` manually. You are even creating an executor service when not necessary (when `oExport.isPresent()` returns `false`) and not calling `shutdown()` in that case. Besides that, it’s not clear to me, whether you are creating one Excel file with 130,000 rows or 130,000 Excel files out of a database with 130,000 rows. Neither makes much sense to me. – Holger Mar 13 '23 at 16:46
  • @Holger, I'm creating one Excel file with 1M rows, the program has an OOM issue after writing around 130,000 rows. That's why I specifically added an ExecutorService. – Coding_Rabbit Mar 14 '23 at 15:14
  • @RealSkeptic, I also thought it wouldn't work, so I tested it a couple of times and always received the same results. The exporting can proceed successfully with the ExecutorService and has an OOM issue if I removed the ExecutorService. – Coding_Rabbit Mar 14 '23 at 15:18
  • 1
    Maybe, the only effect is that the execution of `exportFunc()` is deferred until the `exportData` method returned and the garbage collector may reclaim memory. If I understand you correctly, the string referenced by `json` contains the entire data for the 130,000+ rows while `exportPayload` contains the same data in converted form. Then, things get blurry as you are calling something that looks like a static method (`TenantContext.setTenant(…)`, even twice) and a method taking no parameters (`exportFunc()`). But it seems like the json string is not needed at this point and freeing it may help. – Holger Mar 14 '23 at 16:12

0 Answers0