2

I am using postgreSQL for a scientific application (unsupervised clustering). The python program is multi-threaded so that each thread manages its own postmaster process (one per core). Hence, their is a lot of concurrency.

Each thread-process loop infinitely though two SQL queries. The first is for reading, the second is for writing. The read operation considers 500 time the amount of rows the write operation considers.

Here is the output of dstat:

----total-cpu-usage---- ------memory-usage----- -dsk/total- --paging-- --io/total-
usr sys idl wai hiq siq| used  buff  cach  free| read  writ| in   out | read writ
  4   0  32  64   0   0|3599M   63M   57G 1893M|1524k   16M|  0    0 |  98  2046 
  1   0  35  64   0   0|3599M   63M   57G 1892M|1204k   17M|  0    0 |  68  2062 
  2   0  32  66   0   0|3599M   63M   57G 1890M|1132k   17M|  0    0 |  62  2033 
  2   1  32  65   0   0|3599M   63M   57G 1904M|1236k   18M|  0    0 |  80  1994 
  2   0  31  67   0   0|3599M   63M   57G 1903M|1312k   16M|  0    0 |  70  1900 
  2   0  37  60   0   0|3599M   63M   57G 1899M|1116k   15M|  0    0 |  71  1594 
  2   1  37  60   0   0|3599M   63M   57G 1898M| 448k   17M|  0    0 |  39  2001 
  2   0  25  72   0   0|3599M   63M   57G 1896M|1192k   17M|  0    0 |  78  1946 
  1   0  40  58   0   0|3599M   63M   57G 1895M| 432k   15M|  0    0 |  38  1937 

I am pretty sure I could write more often than that for I have seen it write up to 110-140M on dstat. How can I optimize this process?

user84590
  • 147
  • 3

3 Answers3

4

I am the author of dstat and a system engineer. I noticed a 60% avg iowait time. Given your output I would say that your disks are quite busy. You could try the new --disk-util plugin option in recent dstat versions.

This will show you the utilization rate of your disk(s), and I would expect it to be close to 100% for the disk(s) you are using. So given your particular I/O pattern, your disk is sufficiently busy handling read or write requests.

Why is this below benchmark numbers ? Because often when you benchmark disk throughput, you are stressing your disk in a certain pattern that is optimal for your disk/caches (like eg. linear reads or writes with large blocksizes with a single thread) while in your current workload the particular patterns may be less optimal (random reads or writes with small or varied blocksizes with multiple threads asking for resources).

This difference in pattern can make a huge difference in throughput. And getting a better throughput during a real workload means you will have to do benchmarking with a workload that matches more closer your real workload, to see what the maximum is you can achieve under those conditions. Or you can influence the real workload by changing your design (eg. align blocksizes in your application with filesystem/disk subsystem) or improve caching and/or read-ahead.

There's no simple way to fix this without analyzing what your workload looks like.

  • Very good explanation. Just want to point out that throwing faster hard disks on IO bound databases tends to be a quick and dirty way to improve performance. SSD are especially nice for this. – pehrs Mar 30 '10 at 21:45
0

I'd say it very much depends on the code of the program, it's possibly waiting for a worker thread to resynchronize before starting the next operation.

Does the read operation involve the same data that is being written, if so, you may get concurrency/race conditions if you stop it from waiting for the other threads to catch up.

Probably best to move this to stack overflow.

Aaron Tate
  • 1,222
  • 7
  • 9
  • No, I am positive that it has nothing to do with python. The python app is only there to call SQL Functions or to manage the 8 postmasters. – user84590 Mar 25 '10 at 23:48
0

Actually it may well be Python code level. Python uses a Global Interpreter Lock to handle threading and you could be bumping against the lock. There is a post on StackOverflow about the GIL and multi-core systems.

I would look into using a process per postmaster and a "master master" if needed to manage those process, or possibly Twisted to get around the GIL.

  • No I can confirm that this is not the problem. I know about the GIL. Other workloads (longer Selects with less Writes) can use up all RAM. What python does in nothing time-compared to what each postmaster does. Python uses less than 1% of a CPU-core. Thanks for the thought though :D – user84590 Apr 01 '10 at 19:48