0

I am working on a Micronaut Application. In this application, there is a main scheduled job which works like a workflow engine and multiple steps process the data which is around 600K records.

Along with this scheduled job, we have an event listener which keeps getting events and on the receiver side, we save the event payload in a table.

We are also maintaining a health check endpoint in the application which is getting called every 1 min and returning a response that is prepared by a background job (runs every 2 min).

HealthCheckStatus status;

@Get
@ExecuteOn(TaskExecutors.IO)
public HealthCheckStatus index() {
    log.debug("HealthCheck requested");
    return status;
}

@Scheduled(fixedDelay = "${scheduler.healthcheck.fixedDelay}", initialDelay = "${scheduler.healthcheck.initialDelay}")
public void refreshStatus() {
   log.debug("HealthCheck refresh requested...");
   status = // Here we are checking internal dependencies and populating this status
}

application.yml:

micronaut:
  application:
    name: cbo-adaptor
  netty:
    event-loops:
      httpclientthreadpool:
        num-threads: 10
        prefer-native-transport: true
  executors:
    io:
      type: fixed
      nThreads: 50

Recently we are seeing some delay in a couple of steps of the workflow engine. If I check the logs I can see that there are many logs related to the event received and health check b/w two logs files which is expected to be executed one after another immediately.

{"@timestamp":"2023-03-18T03:31:29.948Z","logger_name":"com.org.implementation.JobService","level":"DEBUG","level_value":10000,"job":"1234","job_status":"WORKING","job_step":"ProcessingStep","origin":"JOB_TYPE1,"message":"start processing","thread_name":"scheduled-executor-thread-2"}
{"@timestamp":"2023-03-18T03:31:38.198Z","logger_name":"com.org.endpoints.healthcheck.HealthCheckController","level":"DEBUG","level_value":10000,"message":"HealthCheck requested","thread_name":"io-executor-thread-23","v":"0.0.35.4-20230310.2"}
{"@timestamp":"2023-03-18T03:32:13.122Z","logger_name":"com.org.endpoints.healthcheck.HealthCheckController","level":"DEBUG","level_value":10000,"message":"HealthCheck requested","thread_name":"io-executor-thread-13","v":"0.0.35.4-20230310.2"}
{"@timestamp":"2023-03-18T03:32:59.025Z","logger_name":"com.org.endpoints.healthcheck.HealthCheckController","level":"DEBUG","level_value":10000,"message":"HealthCheck refresh requested...","thread_name":"scheduled-executor-thread-4","v":"0.0.35.4-20230310.2"}
{"@timestamp":"2023-03-18T03:33:13.123Z","logger_name":"com.org.endpoints.healthcheck.HealthCheckController","level":"DEBUG","level_value":10000,"message":"HealthCheck requested","thread_name":"io-executor-thread-40","v":"0.0.35.4-20230310.2"}
{"@timestamp":"2023-03-18T03:33:38.701Z","logger_name":"com.org.endpoints.healthcheck.HealthCheckController","level":"DEBUG","level_value":10000,"message":"HealthCheck requested","thread_name":"io-executor-thread-6","v":"0.0.35.4-20230310.2"}
{"@timestamp":"2023-03-18T03:34:13.122Z","logger_name":"com.org.endpoints.healthcheck.HealthCheckController","level":"DEBUG","level_value":10000,"message":"HealthCheck requested","thread_name":"io-executor-thread-1","v":"0.0.35.4-20230310.2"}
{"@timestamp":"2023-03-18T03:35:05.487Z","logger_name":"com.org.implementation.MyEventService","level":"DEBUG","level_value":10000,"message":"event received: MyDto(id=12313, type=A, timestamp=2023-03-18T03:35:05.487514, payload=Dto(field1=value1, field2=value2)","thread_name":"pool-4-thread-1"}
{"@timestamp":"2023-03-18T03:37:00.167Z","logger_name":"com.org.implementation.JobService","level":"DEBUG","level_value":10000,"job":"1234","job_status":"WORKING","job_step":"ProcessingStep","origin":"JOB_TYPE1","message":"completed processing","thread_name":"scheduled-executor-thread-2"}

Here the method for ProcessingStep:

public void processData(UUID jobId) {
    log.debug("started Processing");
    repository.getItem(jobId, JobType.TYPE_1) //not records of TYPE_1 in db for this case
            .map(Item::getId)
            .map(this::processType1Data) //calling another api on itemId
            .blockingSubscribe();
    log.debug("completed processing");

}

In the above case for the particular job id, we don't have TYPE_1 item in the database so ideally, both logs should print immediately one after another within milliseconds/seconds but as per the logs completed processing got executed after a delay of 5-6 minute and in b/w logs are related to the execution of health check and event receive.

I believe it's due to context switching b/w thread but still not having much depth knowledge about it.

I want my job to completes quickly without being impacted by events.

Manish
  • 1,274
  • 3
  • 22
  • 59

0 Answers0