Postgres queries in production results in abnormally high disk read I/O

Question

I'm using Ubuntu 16.04 with PostgreSQL 9.5 and Django 1.11

My website has been suffering from super long ajax calls (upwards of 30 seconds in some cases). The same ajax calls take about 500ms in development.

The problem is correlated with disk read I/O. Doing a single query in production drives the disk read I/O up to 25MB/s; an identical query in development results in less than 0.01 MB/s of disk read I/O. The code and query are identical in production/development.

So something with postgres in production is causing abnormally high disk read I/O. What could it be?

Here is an example query that takes ~25 seconds in production, and only 500ms in development:

EXPLAIN (ANALYZE, BUFFERS)
SELECT COUNT(*) AS "__count" FROM "map_listing" 
WHERE ("map_listing"."lo" <  -79.32516245458987 AND "map_listing"."la" > 43.640279060122346
AND "map_listing"."lo" >  -79.60531382177737 AND "map_listing"."transaction_type" = 'Sale'
AND "map_listing"."la" < 43.774544561921296 
AND NOT ("map_listing"."status" = 'Sld' AND "map_listing"."sold_date" < '2018-01-21'::date
AND "map_listing"."sold_date" IS NOT NULL)
AND NOT (("map_listing"."status" = 'Ter' OR "map_listing"."status" = 'Exp'))
AND NOT (("map_listing"."property_type" = 'Parking Space' OR "map_listing"."property_type" = 'Locker')));

Result of doing EXPLAIN (ANALYZE, BUFFERS) on the above statement (production):

 Aggregate  (cost=89924.55..89924.56 rows=1 width=0) (actual time=27318.859..27318.860 rows=1 loops=1)
   Buffers: shared read=73424
   ->  Bitmap Heap Scan on map_listing  (cost=4873.96..89836.85 rows=35079 width=0) (actual time=6061.214..27315.183 rows=3228 loops=1)
         Recheck Cond: ((la > 43.640279060122346) AND (la < 43.774544561921296))
         Rows Removed by Index Recheck: 86733
         Filter: ((lo < '-79.32516245458987'::numeric) AND (lo > '-79.60531382177737'::numeric) AND ((status)::text <> 'Ter'::text) AND ((status)::text <> 'Exp'::text) AND ((property_type)::text <> 'Parking Space'::text) AND ((property_type)::text <> 'Locker'::text) AND ((transaction_type)::text = 'Sale'::text) AND (((status)::text <> 'Sld'::text) OR (sold_date >= '2018-01-21'::date) OR (sold_date IS NULL)))
         Rows Removed by Filter: 190108
         Heap Blocks: exact=46091 lossy=26592
         Buffers: shared read=73424
         ->  Bitmap Index Scan on map_listing_la_88ca396c  (cost=0.00..4865.19 rows=192477 width=0) (actual time=156.964..156.964 rows=194434 loops=1)
               Index Cond: ((la > 43.640279060122346) AND (la < 43.774544561921296))
               Buffers: shared read=741
 Planning time: 0.546 ms
 Execution time: 27318.926 ms
(14 rows)

Result of EXPLAIN (ANALYZE, BUFFERS) (development):

 Aggregate  (cost=95326.23..95326.24 rows=1 width=8) (actual time=495.373..495.373 rows=1 loops=1)
   Buffers: shared read=77281
   ->  Bitmap Heap Scan on map_listing  (cost=5211.98..95225.57 rows=40265 width=0) (actual time=80.929..495.140 rows=4565 loops=1)
         Recheck Cond: ((la > 43.640279060122346) AND (la < 43.774544561921296))
         Rows Removed by Index Recheck: 85958
         Filter: ((lo < '-79.32516245458987'::numeric) AND (lo > '-79.60531382177737'::numeric) AND ((status)::text <> 'Ter'::text) AND ((status)::text <> 'Exp'::text) AND ((property_type)::text <> 'P
arking Space'::text) AND ((property_type)::text <> 'Locker'::text) AND ((transaction_type)::text = 'Sale'::text) AND (((status)::text <> 'Sld'::text) OR (sold_date >= '2018-01-21'::date) OR (sold_date
 IS NULL)))
         Rows Removed by Filter: 198033
         Heap Blocks: exact=49858 lossy=26639
         Buffers: shared read=77281
         ->  Bitmap Index Scan on map_listing_la_88ca396c  (cost=0.00..5201.91 rows=205749 width=0) (actual time=73.070..73.070 rows=205569 loops=1)
               Index Cond: ((la > 43.640279060122346) AND (la < 43.774544561921296))
               Buffers: shared read=784
 Planning time: 0.962 ms
 Execution time: 495.822 ms
(14 rows)

if you want to understand the difference - you have to post BOTH plans - prod and develoment — Vao Tsun, Feb 22 '18 at 08:37
Do you have lots of update or delete on map_listing table? It looks like the table need vacuum and its indexes require reindex because of the bloating. You can take into consideration decreasing of fillfactor for your table and indexes if the table takes heavy updates. — samed.yildirim, Feb 22 '18 at 08:38
@VaoTsun Posted `EXPLAIN (ANALYZE, BUFFERS)` of same query in development — Jack, Feb 22 '18 at 17:27
@samed.yildirim Yes, I do update the map_listing table on a daily basis; 500-1000 records are updated everyday. I did a full vacuum with no noticeable change. However, by doing `REINDEX TABLE map_listing`, the query speed dropped by ~10 seconds, from ~25 secs to ~15 secs. Good improvement but still unacceptable for production. Are there any other tricks I can try? — Jack, Feb 22 '18 at 18:04
@samed.yildirim `VACUUM` and `REINDEX` did not affect the disk read I/O; it is still skyrocketing to 25MB/s with a singular query — Jack, Feb 22 '18 at 18:10

score 2 · Accepted Answer · edited Aug 04 '22 at 22:21

2

This query did not produce any disk I/O – all the blocks are read from shared buffers. But since the query reads 73424 blocks (about 574 MB), it will produce substantial I/O load when the table is not cached.

But there are two things that can be improved.

You have lossy block matches in the heap scan. That means that work_mem is not big enough to contain a bitmap with a bit per table row, and 26592 bits map a table block instead. All the rows have to be rechecked, and 86733 rows are discarded, most of which are false positives from the lossy block matches.

If you increase work_mem, a bitmap with a bit per table row will fit into memory, and this number will shrink, reducing the work during the heap scan.
190108 rows are discarded because they do not match the additional filter condition in the bitmap heap scan. This is probably where most of the time is spent. If you can reduce that amount, you will win.

The ideal indexes for this query would be:
```
  CREATE INDEX ON map_listing(transaction_type, la);
  CREATE INDEX ON map_listing(transaction_type, lo);
```
If transaction_type is not very selective (i.e., most rows have the value Sale), you can omit that column.

EDIT:

Examination of vmstat and iostat shows that both CPU and the I/O subsystem are suffering from massive overload: all the CPU resources are spent on I/O wait and VM steal time. You need a better I/O system and a host system with more free CPU resources. Increasing RAM might alleviate the I/O problem, but only for the disk reads.

edited Aug 04 '22 at 22:21

Glen Thompson

9,071
4
54
50

answered Feb 22 '18 at 08:42

Laurenz Albe

209,280
17
206
263

I changed `work_mem` in my `postgresql.conf`. I tried 1310kB, 4MB, 8MB and 16MB. At 16MB, there are no lossy block matches, but the query time never went below 20 seconds for any of these attempts. I just posted the result of `EXPLAIN (ANALYZE, BUFFERS)` in development. The amount of lossy block matches and discarded rows are pretty much the same, yet development takes 500ms to execute, while production took 27 seconds. – Jack Feb 22 '18 at 17:57
And on the topic of indexes, they already exist for `la`, `lo`, `transaction_type`, and any rows in the table that may be queried. – Jack Feb 22 '18 at 17:58
Yes, that is mysterious. Can you use `iotop` to see which processes cause the I/O? Could the system be swapping because of memory pressure (`SWAPIN` column)? How much RAM do you have and what is `shared_buffers` set to? How busy is the CPU? Can you install debuginfo packages for the kernel and PostgreSQL and see with `perf top` where the time is spent? Sorry for the many questions, just trying to get a clue. Oh yes, did you disable transparent hugepages (`echo never > /sys/kernel/mm/transparent_hugepage/enabled`)? – Laurenz Albe Feb 22 '18 at 19:14
I'm glad you are asking so many questions. I've been stuck with this problem for more than a week and been losing sleep over it. Ask as many as you can... [This](https://i.gyazo.com/6406e1278629e5bdb014f7552e4642c6.png) is what `iotop` shows when I run the query. It fluctuates between 10-30 MB/s from **a single query**! I tried looking up `SWAPIN` but can't find what I'm supposed to look for – Jack Feb 22 '18 at 20:19
I have 1GB of RAM in production, 16GB in development. On my [server dashboard](https://i.gyazo.com/0d8fd6c90c879082efea6e2a557f2379.png), RAM is not affected at all during a query. `shared_buffers` is set to `256MB`. FYI I followed the instructions [here](http://pgtune.leopard.in.ua/) to set my postgres settings. CPU gets busy with a query. As you can see from my dashboard, when disk read I/O spikes, CPU spikes. – Jack Feb 22 '18 at 20:22
Here is what `perf top` [shows](https://i.gyazo.com/6c2effc85f6db5f6899dded70733fd46.png) during a query. I have never heard of `echo never > /sys/kernel/mm/transparent_hugepage/enabled` so I don't believe that's the issue. – Jack Feb 22 '18 at 20:23
Also, here's [my site](https://www.bungol.ca/map/?). The queries are run whenever the user pans the map or zooms in/out. The 25 sec queries happen at around zoom level 12-13. Interestingly enough, when you zoom out all the way, the queries only take 5 seconds. At medium zoom (11-13) is when the queries are super slow – Jack Feb 22 '18 at 20:28
Ad transparent hugepages: You *should* disable them, although I think it is not related to your current problem. – Laurenz Albe Feb 23 '18 at 08:29
In the one `iotop` snapshot you are showing, you see that the background writer is mighty busy, and that the `SELECT` command spends almost its whole time waiting for I/O requests to be completed. This does *not* match the execution plan from above, because that didn't produce any I/O, but it still took 27 seconds to complete. Yet perhaps your system is CPU starved too. Can you provide output for `vmstat 1` and `iostat -mNx 1`? I have the feeling that your system is I/O bottlenecked. In that case, more RAM and greater `shared_buffers` would help a lot, and a better I/O subsystem would be good. – Laurenz Albe Feb 23 '18 at 08:50
[Result of `vmstat 1`](https://i.gyazo.com/e0e5c8ba05c8cc726946af1ada0affb1.png). [Result of `iostat -mNx 1`](https://i.gyazo.com/a124190e9c5f8e6802231954d1169f5e.png). Both captured during query. – Jack Feb 23 '18 at 16:07
See my addition to the answer. – Laurenz Albe Feb 23 '18 at 18:35
RAM was the issue. I assumed that RAM wasn't the issue because [my dashboard](https://i.gyazo.com/0d8fd6c90c879082efea6e2a557f2379.png) was only showing spikes for CPU & disk I/O, so I kept on upgrading the server's CPU, to no effect. But after adding just another 1GB of RAM, everything sped up. Thanks for walking me through this Laurenz!! – Jack Feb 23 '18 at 23:38
Also it is really odd that the disk I/O would spike to 25 MB/s with 1GB of RAM. The server is currently at 4GB of RAM and the same query takes only 60 KB/s of disk I/O. So it would seem that a lack of RAM made the disk I/O go haywire. – Jack Feb 23 '18 at 23:40

score 1 · Answer 2 · answered May 19 '18 at 17:04

1

(I don't yet have the privilege to comment)

I am currently having a problem similar to Jacks. My query was slowed down after create indexes, and my tweaking of work_mem and shared_buffers is not making any improvements.

When you say RAM was the issue, what did you do to fix it? My server har 32GB RAM, and I have even tried setting work_mem = 16GB.

iotop reads:

DISK READ  DISK WRITE  SWAPIN     IO>    COMMAND
86.28 M/s    0.00 B/s  0.00 %   87.78 %  postgres

(Edit: link to my question on gis.stackexchange)

answered May 19 '18 at 17:04

Adrian Tofting

196
1
6

1

You do not want to set work_mem to 16GB. The parameter work_mem is per statement usage. Every single connection to your database would consume 16gb of RAM. Every hash/sort will consume 16GB. Increase the size of your shared_buffers to 8gb , work_mem to 64mb and effective_cache_size to 16gb. Test again and see if there is a difference. – Abrham Smith May 27 '19 at 17:02

Postgres queries in production results in abnormally high disk read I/O

2 Answers2

Linked