1

Could anyone help me understand the following situation. I have 1 worker with the configuration:

workerOptions := worker.Options{
    BackgroundActivityContext:               ctx,
    MaxConcurrentWorkflowTaskPollers:        10,
    MaxConcurrentActivityTaskPollers:        20,
    MaxConcurrentWorkflowTaskExecutionSize:  256,
    MaxConcurrentLocalActivityExecutionSize: 256,
    MaxConcurrentActivityExecutionSize:      256,

If I set MaxConcurrentWorkflowTaskExecutionSize and MaxConcurrentActivityExecutionSize to 1024, the worker starts to work too slowly. I thought that increasing these two options will help to handle more Activities and WorkflowTasks, but it works differently. The worker has enough CPU/RAM and he is not overloaded at all.

From the Temporal UI I was able to catch that some of workflows freeze for some time in such history state:

1 WorkflowExecutionStarted Aug 10th 10:40:17 am CLOSE TIMEOUT 30m
2 WorkflowTaskScheduled Aug 10th 10:40:17 am TASKQUEUE temporal-basic

Also I adjusted such matching parameters:

matching.numTaskqueueReadPartitions:
- value: 100
  constraints: {}
matching.numTaskqueueWritePartitions:
- value: 100
enter code here

Also when I am playing with a different configurations of worker from time to time I can get such errors on the history service:

temporal-history-5f8757cc4f-v8h94 temporal-history {"level":"error","ts":"2021-08-09T22:26:09.181Z","msg":"Fail to process task","service":"history","shard-id":255,"address":"10.218.13.7:7234","shard-item":"0xc09d263700","component":"transfer-queue-processor","cluster-name":"active","shard-id":255,"queue-task-id":2213997,"queue-task-visibility-timestamp":"2021-08-09T22:26:00.658Z","xdc-failover-version":0,"queue-task-type":"TransferActivityTask","wf-namespace-id":"4b775794-a076-499e-aa11-177db696d780","wf-id":"basic-workflow-30-0-5-3523","wf-run-id":"fc82334c-b57d-4d08-8c0d-480b6156b995","error":"context deadline exceeded","lifecycle":"ProcessingFailed","logging-call-at":"taskProcessor.go:332","stacktrace":"go.temporal.io/server/common/log.(*zapLogger).Error\n\t/temporal/common/log/zap_logger.go:143\ngo.temporal.io/server/service/history.(*taskProcessor).handleTaskError\n\t/temporal/service/history/taskProcessor.go:332\ngo.temporal.io/server/service/history.(*taskProcessor).processTaskAndAck.func1\n\t/temporal/service/history/taskProcessor.go:218\ngo.temporal.io/server/common/backoff.Retry\n\t/temporal/common/backoff/retry.go:103\ngo.temporal.io/server/service/history.(*taskProcessor).processTaskAndAck\n\t/temporal/service/history/taskProcessor.go:244\ngo.temporal.io/server/service/history.(*taskProcessor).taskWorker\n\t/temporal/service/history/taskProcessor.go:167"}

The goal’s to understand what I should adjust(options/configs) to get more performance from Temporal.

I will appreciate any tips on where to look at a problem.

James
  • 4,211
  • 1
  • 18
  • 34

1 Answers1

1

Here's a guide on how to think about Worker tuning. If it doesn't cover your case, please submit an issue!

https://docs.temporal.io/application-development/worker-performance/

Dmitry Spikhalskiy
  • 5,379
  • 1
  • 26
  • 40
Loren
  • 13,903
  • 8
  • 48
  • 79