5

Context:

I want to store some temporary results in some temporary tables. These tables may be reused in several queries that may occur close in time, but at some point the evolutionary algorithm I'm using may not need some old tables any more and keep generating new tables. There will be several queries, possibly concurrently, using those tables. Only one user doing all those queries. I don't know if that clarifies everything about sessions and so on, I'm still uncertain about how that works.

Objective:

What I would like to do is to create temporary tables (if they don't exist already), store them on memory as far as that is possible and if at some point there is not enough memory, delete those that would be committed to the HDD (I guess those will be the least recently used).

Examples:

The client will be doing queries for EMAs with different parameters and an aggregation of them with different coefficients, each individual may vary in terms of the coefficients used and so the parameters for the EMAs may repeat as they are still in the gene pool, and may not be needed after a while. There will be similar queries with more parameters and the genetic algorithm will find the right values for the parameters.

Questions:

  • Is that what "on commit drop" means? I've seen descriptions about sessions and transactions but I don't really understand those concepts. Sorry if the question is stupid.
  • If it is not, do you know about any simple way to get Postgres to do this?

Workaround:

In the worst case I should be able to make a guesstimation about how many tables I can keep on memory and try to implement the LRU by myself, but it's never going to be as good as what Postgres could do.

Thank you very much.

Community
  • 1
  • 1
Trylks
  • 1,458
  • 2
  • 18
  • 31
  • I guess the only thing kept in memory is the OS disk cache and that there is no direct programmer control about what Postgresql does with memory other than the configuration settings. Show the queries the user will be doing so it would be possible to establish the database design and optimization. Otherwise this question is a candidate to be closed as _not constructive_. If the client is a rich one you can consider keeping state at the client side. – Clodoaldo Neto Jan 04 '13 at 21:59
  • Some bonus info that might be of value http://raghavt.blogspot.com/2012/04/caching-in-postgresql.html. – Kuberchaun Jan 05 '13 at 21:04
  • A database is a bad place to run an evolutionary algorithms. You really should cache your temporary data in client program memory, not in a database. You can use for example memcached for parallel access. In Postgres a temporary table would only be accessible from one client, so no concurrency is possible. And they are deleted when a client program disconnects. – Tometzky Jan 05 '13 at 21:54
  • @Clodoaldo It is possible to do that from the client, I was simply hoping Postgres would do the job for me and in some way that is very optimized and efficient, so that I don't have to program that again. – Trylks Jan 07 '13 at 21:22
  • @Tometzky Temporary tables for one single client could be a solution. – Trylks Jan 07 '13 at 21:27

1 Answers1

4

This is a complicated topic and probably one to discuss in some depth. I think it is worth both explaining why PostgreSQL doesn't support this and also what you can do instead with recent versions to approach what you are trying to do.

PostgreSQL has a pretty good approach to caching diverse data sets across multiple users. In general you don't want to allow a programmer to specify that a temporary table must be kept in memory if it becomes very large. Temporary tables however are managed quite differently from normal tables in that they are:

  1. Buffered by the individual back-end, not the shared buffers

  2. Locally visible only, and

  3. Unlogged.

What this means is that typically you aren't generating a lot of disk I/O for temporary tables. The tables do not normally flush WAL segments, and they are managed by the local back-end so they don't affect shared buffer usage. This means that only occasionally is data going to be written to disk and only when necessary to free memory for other (usually more frequent) tasks. You certainly aren't forcing disk writes and only need disk reads when something else has used up memory.

The end result is that you don't really need to worry about this. PostgreSQL already tries, to a certain extent, to do what you are asking it to do, and temporary tables have much lower disk I/O requirements than standard tables do. It does not force the tables to stay in memory though and if they become large enough, the pages may expire into the OS disk cache, and eventually on to disk. This is an important feature because it ensures that performance gracefully degrades when many people create many large temporary tables.

Chris Travers
  • 25,424
  • 6
  • 65
  • 182
  • There is something I don't fully understand. What behaviour should I specify in that case for temporary tables? (`on commit preserve rows`?) And also, could I get temporary tables deleted from the disk if I get to have too many? I'm not sure about the consequences of data expiring on to different places according to the different specifications of the `on commit` clause. I plan to use `create if not exists` for all these temporary tables, so if they are dropped that should not be a problem, except for the time to recalculate them. – Trylks Apr 22 '13 at 12:36
  • It depends on what you are using it for. If you are preserving rows your chance of eventually flushing to disk is higher, but if you need them across transactions you will need to preserve the rows. – Chris Travers Apr 22 '13 at 13:43
  • OK, one last thing, can I set a limit to the number of temporary tables that are preserved or the size in disk reserved for temporary tables? if not: is there any way to manually delete them? if too many tables are stored in the disk for too long this could potentially eat up all the space in the disk. – Trylks Apr 22 '13 at 15:58
  • Temporary tables are deleted when sessions end. You could force an end to the session. If disk space is a major concern you could put the temporary tables in a tablespace on small disk partition. – Chris Travers Apr 23 '13 at 02:12
  • I guess I'll then use regular tables. That seems to be the only way in which I can have a cache of tables and delete the LRU when surpassing a certain number of tables, for instance 1000. I hope postgres is clever enough to keep in memory the ones that were used lately and that fit into memory space. I'll check the configuration parameters to try to optimize that. Thank you :) – Trylks Apr 24 '13 at 12:45
  • The big difference with temporary tables is that other backends are not aware of their existence or contents. If you need to be able to manage across backends, temporary tables are the wrong solution. In general, the PostgreSQL cache will try to keep stuff in memory that was recently used. If you have trouble with this, try adjusting shared_buffers up. The reason is that the PG cache is more full-featured and more clever than the OS disk cache but this comes at a computational cost. – Chris Travers Apr 25 '13 at 01:31