0

We manage our own ClearML server, on an EC2 instance AWS cloud. Instance type: t3.xlarge (4 vCPUs, 16 GiB Memory). Data disk: gp3 (size: 200 GB, IOPS: 3,000, Throughput: 125).

We have 3 ClearML projects, one with 643,000 experiments, another with 151,000 and the small one with 25,000. Total experiments in all projects: 819,000

ClearMLwebapp is very slow. For example, it takes about 30 seconds just to load the main dashboard. Searching a specific experiment by ID is also very slow.

What can we do to improve the performance?

We tried to add more memory, and it improved the performance, but only a little. It is still to slow.

hilel14
  • 21
  • 2

1 Answers1

0

Disclaimer: I'm a member of the ClearML team (formerly Trains)

I think your issue is simply caused by the number of serving processes in the server's apiserver component (probably 1 process at the moment).

Assuming you are using the docker-compose deployment of ClearML Server, in order to increase the number of processes add the CLEARML_USE_GUNICORN=1 environment variable to the apiserver service.

This would run the apiserver component with 8 processes by default. To specify a different number of processes, add the CLEARML_GUNICORN_WORKERS=12 environment variable (for 12 processes, for example).

Please note that this mode (and of course, more processes) required more CPU and RAM resources. I believe your current setup should be enough for 8 processes, but I would recommend to monitor the machine's CPU and RAM usage and upgrade as required.

Martin.B
  • 599
  • 3
  • 9
  • Thanks for your answer We added CLEARML_USE_GUNICORN: 1 to docker-compose configuration. After that, the command:`pgrep gunicorn` indicates there are 9 processes running, and `docker stats` shows the memory consumption of the api-server container increased from 60MB to 430MB, but the webapp is still slow. Could it be related to MongoDB indexing? – hilel14 Dec 16 '22 at 06:29