We manage our own ClearML server, on an EC2 instance AWS cloud. Instance type: t3.xlarge (4 vCPUs, 16 GiB Memory). Data disk: gp3 (size: 200 GB, IOPS: 3,000, Throughput: 125).
We have 3 ClearML projects, one with 643,000 experiments, another with 151,000 and the small one with 25,000. Total experiments in all projects: 819,000
ClearMLwebapp is very slow. For example, it takes about 30 seconds just to load the main dashboard. Searching a specific experiment by ID is also very slow.
What can we do to improve the performance?
We tried to add more memory, and it improved the performance, but only a little. It is still to slow.