0

The application reads a lot of different files from AWS S3 and then sends them to some recipients.

Issues:

  1. Instantly growing number of live threads (it grows untill 1030-1040 threads and then stops at that limit. Almoust all threads are threads to AWS S3 in "parked" state.)
  2. Instantly growing usage of "OLD" space in the heap. After garbage collection it almoust not getting free.

For loading file I use pollEnrich and consumer endpoint of AWS2-S3 component.

Application uses

  • Spring Boot 2.4.6
  • Apache Camel 3.10.0
  • Java 11

Route

from("direct:loadFile")
        .routeId("LoadFileRoute")
        .pollEnrich()
          .method(amazonS3Service, "generateConsumerEndpointUrlForLoadingFile")
          .timeout(60000L)
          .cacheSize(-1)
          .threads().executorService(pollEnrichThreadPool)
        .end().id("loadFile");

Endpoint creation

public EndpointConsumerBuilder generateConsumerEndpointUrlForLoadingFile(
      @ExchangeProperty(BUCKET) String bucket,
      @ExchangeProperty(FILEPATH) String filepath) {

    return aws2S3(bucket)
        .fileName(filepath)
        .deleteAfterRead(false)
        .includeBody(true)
        .amazonS3Client(amazonS3Client)
        .scheduledExecutorService(s3EndpointThreadPool)
        .advanced()
          .autocloseBody(false);
  }

Additionaly I've created 2 different thread pools to check different cases:

@Bean("S3EndpointThreadPool")
  public ScheduledExecutorService scheduledExecutorService(CamelContext context) throws Exception {
    return context.getExecutorServiceManager()
        .newScheduledThreadPool(this, "S3EndpointThreadPool1", 10);
  }
@Bean("PollEnrichThreadPool")
  public ExecutorService executorService(CamelContext context) throws Exception {
    return new ThreadPoolBuilder(context)
        .poolSize(10)
        .maxPoolSize(10)
        .maxQueueSize(Integer.MAX_VALUE)
        .build("PollEnrichThreadPool2");
  }

Cases that I've tried but they did not affect memory usage in any kind:

  1. Endpoint without explicitly specified includeBody and autocloseBody params (default values).
  2. Endpoint with includeBody = true and autocloseBody = false
  3. Endpoint with includeBody = false and autocloseBody = true
  4. Endpoint. Add ScheduledExecutorService - .scheduledExecutorService(s3EndpointThreadPool) (10 threads)
  5. PollEnrich EIP. Set thread pool PollEnrichThreadPool - .threads().executorService(pollEnrichThreadPool) (10 threads)
  6. PollEnrich EIP. Disable caching for URI producers/consumers (.cacheSize(-1)) pollEnrich uses dynamic URI for loading files. Basically all URIs are unique and as a main scenario one file ususally read only once.

What am I missing here? Do you have any ideas how to solve these issues?

Workin_Man
  • 93
  • 9
  • I am also facing the same issue. Did you find any solution for this? – sachin dhus Feb 14 '22 at 13:14
  • @sachindhus it happened that the issue was not related to pollenrich EIP. The issue was caused by JPA/Hibernate level. Each time a lot of persistent entities were loaded into Persistence Context and for some reason, they were not garbage collected. I changed the query to load only one field that is important for logic and it solved the issue. – Workin_Man Feb 16 '22 at 10:00

0 Answers0