10

For an academic experiment I need to restrict the total amount of memory that is available for a pgSQL server to compute a given set of queries.

I know that I can do this through postgresql.conf file, where I can adjust some parameters related with Resource Management.

The problem is that: it's not clear for me--given the several parameters available on the config file--which is the parameter that I should change.

When I first opened the config file I'm expecting someting like this: max_server_memmory. Instead I found a lot of: shared_buffers, temp_buffers, work_mem, and so on...

Given that, I've consulted pgSQL docs. on Resource Consumption and I come up with the shared_buffers as the best candidate for what I'm looking for: the parameter that restricts the total amount of memory that a pgSQL server can use to perform its computation. But I'm not completely sure about this.

Can you guys give me some insight about which parameters should I adjust to restrict the pgSQL server's memory, please?

Sarah Messer
  • 3,592
  • 1
  • 26
  • 43
Diogo
  • 439
  • 1
  • 4
  • 15
  • 2
    There's no such parameter. If you need postgres to fail to allocate above a certain memory footprint, the OS is able to do that with unix `ulimit` or a non-unix equivalent, knowing that each connection to postgres has a separate process. – Daniel Vérité Mar 04 '15 at 01:53
  • Did you figure out a better method than limiting shared_buffers? – vahid Oct 31 '19 at 00:05

2 Answers2

14

Unfortunately, PostgreSQL does not have a single easy to adjust parameter for this. One might think that there would be one tunable along the lines max_memory_usage = 1500MB and everything would be automatically scaled to offer best performance for given resources.

However, as that's not available, the next best thing seems to be:

Actual max RAM = shared_buffers + (temp_buffers + work_mem) * max_connections

As a rough guide, shared_buffers should be set to 40% of the memory you are willing to use for PostgreSQL, max_connections to maximum number of parallel connections you want to have and temp_buffers and work_mem so that you don't go over your RAM budget. If you don't use temporary tables, setting temp_buffers to pretty low (default is 8 MB) will allow setting work_mem a bit higher. The work_mem is mostly used for sorting rows so you'll need a lot of work_mem if you handle queries dealing with high row count as intermediate or final result. PostgreSQL will work with very low settings if needed but many queries will then need to create temporary files on the server instead of keeping things in RAM which obviously results in sub-par performance.

Note that if you set shared_buffers to higher than a couple of gigabytes, you should enable "huge pages" feature for your OS. Otherwise quite a big hunk of the RAM will be lost to OS virtual memory page tables. You'll also get slightly lower overall performance with the default 4KB pages. Unfortunately, configuring huge pages is a bit complex task on any OS. (This is not a bug in PostgreSQL but a shortcoming of virtual memory handling on 32 and 64 bit x86 processors and huge pages setup is the best workaround for the issue. Again, everything will work without doing the huge pages dance but performance will not be as good as it could be.)

That said, if you're running recent enough Linux kernel, you can usually just add something like vm.nr_overcommit_hugepages=25000 to the end of /etc/sysctl.conf. This example will allow using up to 50 GB of your RAM for huge pages but this is in overcommit mode so it's not immediately reserved. PostgreSQL will then automatically try to reserve the shared_buffers from this RAM area. Be warned that huge pages cannot be swapped out and if you run out of RAM with hugepages, kernel OOM Killer may be triggered even before swap is full.

It's a good idea to set nr_overcommit_hugepages to value less than your full system RAM – note that default size of a single hugepage is 2 MB so a good value could be 0.45 * real installed RAM in MB. You can use something like

echo "$(getconf _PHYS_PAGES) * 0.45 / 512" | bc > /proc/sys/vm/nr_overcommit_hugepages

in the /etc/rc.local or similar script to set this automatically instead of hardcoding a value in /etc/sysctl.conf.

In addition, some internal limits are scaled according to max_connections (e.g. predicate locks) so in some cases you need to set max_connections way higher than your actual connection count is - especially if you do long transactions with SERIALIZABLE or REPEATABLE READ isolation levels. For larger installations, you also want to use pgbouncer or pgpool between the client and PostgreSQL to avoid having so many parallel connections that the performance is going to suffer. The best would be to limit actual max connections to CPU count x 1.5 or so, and use transaction pooling instead of connection pooling to get maximum performance. Depending on your use case this may not be possible if clients need to do longer transactions than a couple of queries, or your clients are not compatible with transaction pooling because they use connection specific settings or features. Note that if your clients are not active 100% of the transaction time, you may need to increase your max connection limit to get CPU usage up to 100% for max performance.

Note that if you run PostgreSQL 12.x or lesser, then PostgreSQL may go over the limit you have set for work_mem if query planner misestimation causes hash aggregation to be incorrectly chosen. See https://dba.stackexchange.com/a/285423/29183 for details. Version 13.x doesn't suffer from such a bug and there misestimation only causes reduced performance for said query instead of going over the memory limit, as expected.

Mikko Rantalainen
  • 14,132
  • 10
  • 74
  • 112
  • I have a problem, that with described limits PostgreSQL continue to eat much more memory, for example I set: shared_buffers = 2048MB work_mem = 128MB temp_buffers = 8MB Formula: Actual max RAM = shared_buffers + (temp_buffers + work_mem) * max_connections Calculating my total limit: 2000 MB + (128 MB + 8 MB) * 25 max_connections = 5400 MB But as result, after 24+ hours of working, PostgreSQL always eats 10000+ MB of RAM! For example, at now, with 22 connections, and only 2 active - it eats 10.5 GB (checked via `service postgresql@12-main status`) How this can be happen? – Murz Jan 17 '21 at 07:40
  • Which version of PostgreSQL are you running? I've seen one case where PostgreSQL 12.x had memory leak with `work_mem=128MB` but it didn't leak any memory with `work_mem=32MB`. Never investigated that issue in detail but the change in memory usage didn't match the change in `work_mem` setting but the difference was much larger (PostgreSQL memory usage dropped about 24 GB with that change alone). For that workload the performance didn't have any visible degration with only 32 MB `work_mem` so it was a good fix for that case. – Mikko Rantalainen Jan 18 '21 at 11:29
  • According to section "E.2.3.1.4. General Performance" of https://www.postgresql.org/docs/13/release-13.html PostgreSQL has following behavior until version 13: "Previously, hash aggregation was avoided if it was expected to use more than work_mem memory. [...] once hash aggregation had been chosen, the hash table would be kept in memory no matter how large it got — which could be very large if the planner had misestimated". It may be that using high `work_mem` value results in hash aggregation getting chosen and you end up going over any set memory limits as a result. – Mikko Rantalainen Jan 18 '21 at 11:47
  • And when I wrote above that PostgreSQL leaked memory, I meant that memory usage continued to raise until OOM Killer killed one of the PostgreSQL processes and the PostgreSQL master did full restart. It might have been that memory usage just raised so much that it looked like leak but in reality given infinite RAM it would have released the memory in the future. – Mikko Rantalainen Jan 18 '21 at 11:51
  • Note that even if postgres logically releases memory it has allocated, it may not be returned to operating system depending on the `malloc()`/`free()` implementation of your execution environment. That may result in multiple PostgreSQL processes getting over the limit due use of hash aggregation as described above and the memory is never released to OS even though PostgreSQL isn't actually using it either. This happens because technically `malloc()` may use `brk()` behind the scenes and releasing memory back to OS is only possible only in some special cases. – Mikko Rantalainen Jan 18 '21 at 12:00
  • Historically this is considered "a feature, not a bug" in UNIX systems because it can be worked around simply by adding enough swap. The system will then swap the freed-but-not-really-usable parts of RAM and that's fine because that memory is never actually used in the future. If system sometimes swaps out some memory that was actually used, it will swapped in to memory when used but if this is rare, that will not cause major performance problems in practice. – Mikko Rantalainen Jan 18 '21 at 12:03
1

On postgresql wiki you can find the answer, but beforehand already say that the most that can be done are configurations in shared memories and amounts of simultaneous connections. See this link: https://wiki.postgresql.org/wiki/Tuning_Your_PostgreSQL_Server

Arthur Melo
  • 454
  • 5
  • 13
  • 1
    I also had read this wiki but still "_The shared_buffers configuration parameter determines how much memory is dedicated to PostgreSQL to use for caching data._" was not enough clear for me due the _caching_ part. – Diogo Mar 04 '15 at 18:29