Collecting application level metrics in a Quart multiprocess setup

Question

I have a quart app that is running with hypercorn on production. Eight worker hypercorn processes are configured to be started. My objective is to collect application performance logs such as latency, throughput using prometheus. From the quart app I am incrementing/updating counters and histograms based on event using the aioprometheus library. An endpoint /myapp/metrics is exposed in the application to collect the metrics.

Now the problems is that each time this endpoint is hit by the scraping agent it collects data from one process only on which ever the request gets routed to. For example if once process has seen 6 hits for event E1 and other process has seen 7 hits for the same event I need a total of 13 hits as the response to my metrics endpoint but with the current setup it gives either 6 or 7 depending on which process the request gets routed to.

Can someone please suggest me how to get the metrics for my entire application in this multi process hypercorn model. One of the solution is to have all the processes update some common data source and the the metrics endpoint could read for the data source. But before I do that I want to explore if there exists some hypercorn specific solution for it.

Edit: I see a similar question but with gunicorn and flask.

I don't know about hypercorn specifics, but generally if you have multiple backend processes it is advised to scrape them independently. Your idea of aggregating metrics yourself will give you troubles with Prometheus processing counter resets. — markalex, Jun 07 '23 at 06:32
@markalex I would scrape them independently if I could. But to the scraping agent and rest of the outside world only one endpoint is visible namely /myapp/metrics. and it has no knowledge of the multiple processes which internally support the application — shshnk, Jun 07 '23 at 06:59

score -2 · Answer 1 · answered Jun 22 '23 at 14:37

This is because the problem is not related to the Hypercorn server itself, but rather to the way in which you are collecting and aggregating metrics.

When you have multiple worker processes, each process maintains its own view of the application's state. This means that if you are collecting metrics within each worker process, you will only be able to see the metrics for that particular process.

To get a complete picture of your application's performance, you need to aggregate the metrics across all worker processes. One way to do this is to use a shared data source, such as a Redis database or a shared file system, to store the metrics. Each worker process can then update the shared data source as it handles requests, and the metrics endpoint can read from the shared data source to provide a complete view of the application's performance.

Another approach is to use a dedicated metrics aggregation service, such as Prometheus, to collect and aggregate metrics from each worker process. Each worker process can export its metrics to the Prometheus server, which can then aggregate the metrics across all worker processes and provide a complete view of the application's performance.

In summary, to get a complete view of your application's performance in a multi-process Hypercorn model, you need to aggregate the metrics across all worker processes using a shared data source or a dedicated metrics aggregation service such as Prometheus.

This looks like ChatGPT output, not human output. – tchrist Jun 26 '23 at 02:28 — tchrist, Jun 26 '23 at 02:28

Collecting application level metrics in a Quart multiprocess setup

1 Answers1