0

I have a materialized view based on a very complex (and not very efficient) query. This materialized view is used for BI/visualization. It often takes ~4 minutes to complete the refresh, which is good enough for my needs. Running ANALYZE shows total cost of 2,116,446 with 137,682 rows and 1,976 width.

However, sometimes refresh materialize view XXX just never completes. Looking at the top processes (top in ubuntu), the process will use 100% CPU and 8.1% (of the server's 28GB memory) for a while... then all of a sudden it just disappears from the top list. It usually happens after ~4-5 minutes (although statement_timeout config is disabled). The postgres client just keeps waiting forever, and the view never refreshes.

Running the query behind the view directly (i.e. SELECT ...) will fail as well (same issue).

I'm using version 9.5. I've tried to increase effective_cache_size, shared_buffers, and work_mem in postgres config, but still the same result.

Sometimes, after several attempts, the refresh command will complete successfully. But it's unpredictable and currently just wouldn't work even after multiple attempts / db restarts.

Any suggestions to what might be the problem?

aei
  • 250
  • 1
  • 2
  • 11
  • If it is disappearing from `top`, then congratulations! You managed to make the server crash. You should try to reduce your problem to a simple test case and then [report it to the developers](https://www.postgresql.org/docs/10/static/bug-reporting.html). – Kevin Mar 01 '18 at 04:26
  • @Kevin I disagree. If it crashes, the client will notice. My money is on a lock. When a backend is locked, it stops consuming CPU. – Laurenz Albe Mar 01 '18 at 07:53
  • Could be just the OOM killer. First: check the postgres logfile. – joop Mar 01 '18 at 11:03
  • Thanks @LaurenzAlbe and @joop! I looked at postgres logfile in `/var/log/postgresql/postgresql-9.5-main.log` but it doesn't have anything useful (is that the log file you meant?) I'd be surprised if it's the OOM killer because it only uses 8.1% of the available 28GB memory. Any thoughts about how to go about the lock? Other directions? – aei Mar 01 '18 at 14:20
  • Yes. Check the `pg_locks` view when the problem occurs and see if there are any locks with `granted = false`. – Laurenz Albe Mar 01 '18 at 14:29
  • No locks with `granted = false`. Before I ran the refresh, there were 2 locks (granted). While it ran, 32-38 locks. And then when the problem occurred just immediately dropped to 2 again. Seems like something just stops the query / causes it to suddenly halt... I tried refresh `CONCURRENTLY` but same result. Just stops after 4 minutes and a few seconds. – aei Mar 01 '18 at 15:13
  • @LaurenzAlbe: I was assuming that "disappears from `top`" means *disappears* from `top`, not just "drops to a much lower point in the list." – Kevin Mar 01 '18 at 19:29
  • @Kevin We're both guessing :^) `top` only shows the top processes. @aei: What's your operating system and PostgreSQL version? Since you say the problem occurs with the `SELECT` alone, let's focus on that. Can you add the query and the table definitions to the question? Ideally with `EXPLAIN` output so we know what the DB does. When the query "just hangs" and doesn't consume CPU, can you follow https://wiki.postgresql.org/wiki/Generating_a_stack_trace_of_a_PostgreSQL_backend to get a stack trace of the backend? – Laurenz Albe Mar 02 '18 at 07:58

0 Answers0