4

I have 20+ materialized views that I use as some sort of pre-calculated cache for a number of operations. Each night at 3 AM I do automated sync with an external source, followed by a series of materialized views refreshing.

This is where it gets weird. Each morning, at the beginning of the workday, I execute

SELECT * FROM dbo.materialized_view

and then execute the actual materialized view's query (specified during the CREATE MATERIALIZED VIEW).

I get different results.

The database source does not change after the sync process completes at around 3:10 AM. Originally, REFRESH SQL queries were getting executed from within the same Java sync process after the transaction commits, but since that didn't work, I created a cron job at 4 AM to do the same. That still did not help.

As soon as I manually refresh views in the morning (around 6-7 AM), the results update

Is there something incredibly funky I am missing about how materialized views refresh?

Update:

Here is one of the materialized views I refresh:

create materialized view dbo.ind_newborn_daily as
SELECT dt.date_time::DATE   AS date,
       s.source_id          AS staff_id,
       b.status,
       COUNT(DISTINCT b.id) AS count
FROM dbo.newborn nb
         JOIN dbo.date_time dt ON nb.created_datetime_id = dt.id
         JOIN dbo.staff s ON nb.staff_id = s.id
         JOIN dbo.baby b ON nb.baby_id = b.id
WHERE nb.current = TRUE
GROUP BY CUBE ((dt.date_time::DATE), s.source_id, b.status);

And I update it via:

REFRESH MATERIALIZED VIEW dbo.ind_newborn_daily;

Update 2

Upon reviewing the Postgres statement logs, I've found that refresh statements were executed. though, I failed to mention those are processed by a parallel worker queue (4 threads).

As a result, each refresh statement is wrapped in a separate transaction AND those transitions are shown somewhat intertwined in logs:

2022-09-24 03:06:36.679 CEST,"XZY","XZYdb",17629,"127.0.0.1:44326",632e581c.44dd,3,"BEGIN",2022-09-24 03:06:36 CEST,5/78,0,LOG,00000,"execute <unnamed>: BEGIN",,,,,,,,,"PostgreSQL JDBC Driver","client backend",,0
2022-09-24 03:06:36.679 CEST,"XZY","XZYdb",17629,"127.0.0.1:44326",632e581c.44dd,4,"REFRESH MATERIALIZED VIEW",2022-09-24 03:06:36 CEST,5/78,0,LOG,00000,"execute <unnamed>: REFRESH MATERIALIZED VIEW dbo.view1",,,,,,,,,"PostgreSQL JDBC Driver","client backend",,0
2022-09-24 03:06:36.695 CEST,,,17579,,632e581b.44ab,1,,2022-09-24 03:06:35 CEST,13/3,6157,LOG,00000,"temporary file: path ""base/pgsql_tmp/pgsql_tmp17579.0"", size 3219456",,,,,,"REFRESH MATERIALIZED VIEW dbo.view2",,,"PostgreSQL JDBC Driver","parallel worker",17570,0
2022-09-24 03:06:36.696 CEST,,,17578,,632e581b.44aa,1,,2022-09-24 03:06:35 CEST,12/3,6157,LOG,00000,"temporary file: path ""base/pgsql_tmp/pgsql_tmp17578.0"", size 4087808",,,,,,"REFRESH MATERIALIZED VIEW dbo.view2",,,"PostgreSQL JDBC Driver","parallel worker",17570,0
2022-09-24 03:06:36.713 CEST,"XZY","XZYdb",17631,"127.0.0.1:44330",632e581c.44df,1,"SET",2022-09-24 03:06:36 CEST,6/29,0,LOG,00000,"execute <unnamed>: SET extra_float_digits = 3",,,,,,,,,"","client backend",,0
2022-09-24 03:06:36.713 CEST,"XZY","XZYdb",17631,"127.0.0.1:44330",632e581c.44df,2,"SET",2022-09-24 03:06:36 CEST,6/30,0,LOG,00000,"execute <unnamed>: SET application_name = 'PostgreSQL JDBC Driver'",,,,,,,,,"","client backend",,0
2022-09-24 03:06:36.713 CEST,"XZY","XZYdb",17631,"127.0.0.1:44330",632e581c.44df,3,"BEGIN",2022-09-24 03:06:36 CEST,6/31,0,LOG,00000,"execute <unnamed>: BEGIN",,,,,,,,,"PostgreSQL JDBC Driver","client backend",,0
2022-09-24 03:06:36.713 CEST,"XZY","XZYdb",17631,"127.0.0.1:44330",632e581c.44df,4,"REFRESH MATERIALIZED VIEW",2022-09-24 03:06:36 CEST,6/31,0,LOG,00000,"execute <unnamed>: REFRESH MATERIALIZED VIEW dbo.view3",,,,,,,,,"PostgreSQL JDBC Driver","client backend",,0
2022-09-24 03:06:36.737 CEST,"XZY","XZYdb",17570,"127.0.0.1:44232",632e581b.44a2,5,"REFRESH MATERIALIZED VIEW",2022-09-24 03:06:35 CEST,7/7,6157,LOG,00000,"temporary file: path ""base/pgsql_tmp/pgsql_tmp17570.2"", size 4038656",,,,,,"REFRESH MATERIALIZED VIEW dbo.view2",,,"PostgreSQL JDBC Driver","client backend",,0
2022-09-24 03:06:36.838 CEST,"XZY","XZYdb",17615,"127.0.0.1:44304",632e581c.44cf,5,"COMMIT",2022-09-24 03:06:36 CEST,4/189,6177,LOG,00000,"execute S_1: COMMIT",,,,,,,,,"PostgreSQL JDBC Driver","client backend",,0
2022-09-24 03:06:36.968 CEST,"XZY","XZYdb",17570,"127.0.0.1:44232",632e581b.44a2,6,"REFRESH MATERIALIZED VIEW",2022-09-24 03:06:35 CEST,7/7,6157,LOG,00000,"temporary file: path ""base/pgsql_tmp/pgsql_tmp17570.4"", size 2703360",,,,,,"REFRESH MATERIALIZED VIEW dbo.view2",,,"PostgreSQL JDBC Driver","client backend",,0
2022-09-24 03:06:36.992 CEST,"XZY","XZYdb",17631,"127.0.0.1:44330",632e581c.44df,5,"COMMIT",2022-09-24 03:06:36 CEST,6/31,6183,LOG,00000,"execute S_1: COMMIT",,,,,,,,,"PostgreSQL JDBC Driver","client backend",,0

Later on, at 4 AM, there goes the cron job, which is synchronous (one view at a time), though I did not wrap these in a transaction:

2022-09-24 04:00:08.866 CEST,"postgres","Xdb",20646,"[local]",632e64a1.50a6,422,"idle",2022-09-24 04:00:01 CEST,3/2638,0,LOG,00000,"statement: REFRESH MATERIALIZED VIEW dbo.view1;",,,,,,,,,"psql","client backend",,0
2022-09-24 04:00:09.842 CEST,,,20662,,632e64a8.50b6,1,,2022-09-24 04:00:08 CEST,5/93,6269,LOG,00000,"temporary file: path ""base/pgsql_tmp/pgsql_tmp20662.0"", size 3620864",,,,,,"REFRESH MATERIALIZED VIEW dbo.view1;",,,"psql","parallel worker",20646,0
2022-09-24 04:00:09.842 CEST,,,20661,,632e64a8.50b5,1,,2022-09-24 04:00:08 CEST,4/208,6269,LOG,00000,"temporary file: path ""base/pgsql_tmp/pgsql_tmp20661.0"", size 3612672",,,,,,"REFRESH MATERIALIZED VIEW dbo.view1;",,,"psql","parallel worker",20646,0
2022-09-24 04:00:09.875 CEST,"postgres","Xdb",20646,"[local]",632e64a1.50a6,423,"REFRESH MATERIALIZED VIEW",2022-09-24 04:00:01 CEST,3/2638,6269,LOG,00000,"temporary file: path ""base/pgsql_tmp/pgsql_tmp20646.42"", size 4554752",,,,,,"REFRESH MATERIALIZED VIEW dbo.view1;",,,"psql","client backend",,0
2022-09-24 04:00:10.354 CEST,"postgres","Xdb",20646,"[local]",632e64a1.50a6,424,"REFRESH MATERIALIZED VIEW",2022-09-24 04:00:01 CEST,3/2638,6269,LOG,00000,"temporary file: path ""base/pgsql_tmp/pgsql_tmp20646.41"", size 10829824",,,,,,"REFRESH MATERIALIZED VIEW dbo.view1;",,,"psql","client backend",,0
2022-09-24 04:00:10.687 CEST,"postgres","Xdb",20646,"[local]",632e64a1.50a6,425,"REFRESH MATERIALIZED VIEW",2022-09-24 04:00:01 CEST,3/2638,6269,LOG,00000,"temporary file: path ""base/pgsql_tmp/pgsql_tmp20646.8.sharedfileset/i2of16.p0.0"", size 655360",,,,,,"REFRESH MATERIALIZED VIEW dbo.view1;",,,"psql","client backend",,0
...
...
2022-09-24 04:00:10.701 CEST,"postgres","Xdb",20646,"[local]",632e64a1.50a6,518,"REFRESH MATERIALIZED VIEW",2022-09-24 04:00:01 CEST,3/2638,6269,LOG,00000,"temporary file: path ""base/pgsql_tmp/pgsql_tmp20646.43"", size 10829824",,,,,,"REFRESH MATERIALIZED VIEW dbo.view1;",,,"psql","client backend",,0
2022-09-24 04:00:10.702 CEST,"postgres","Xdb",20646,"[local]",632e64a1.50a6,519,"REFRESH MATERIALIZED VIEW",2022-09-24 04:00:01 CEST,3/2638,6269,LOG,00000,"temporary file: path ""base/pgsql_tmp/pgsql_tmp20646.40"", size 3588096",,,,,,"REFRESH MATERIALIZED VIEW dbo.view1;",,,"psql","client backend",,0

As you can see, logs are pretty much littered with the usage of temporary files, but I expected those as views are big.

Now comes the manual refresh at 8:11 AM from DataGrip:

2022-09-24 08:11:04.997 CEST,"X","Xdb",1088,"127.0.0.1:44432",632e9f78.440,1,"SET",2022-09-24 08:11:04 CEST,3/3703,0,LOG,00000,"execute <unnamed>: SET extra_float_digits = 3",,,,,,,,,"","client backend",,0
2022-09-24 08:11:05.034 CEST,"X","Xdb",1088,"127.0.0.1:44432",632e9f78.440,2,"SET",2022-09-24 08:11:04 CEST,3/3704,0,LOG,00000,"execute <unnamed>: SET application_name = ''",,,,,,,,,"","client backend",,0
2022-09-24 08:11:05.063 CEST,"X","Xdb",1088,"127.0.0.1:44432",632e9f78.440,3,"SELECT",2022-09-24 08:11:04 CEST,3/3705,0,LOG,00000,"execute <unnamed>: select version()",,,,,,,,,"","client backend",,0
2022-09-24 08:11:05.096 CEST,"X","Xdb",1088,"127.0.0.1:44432",632e9f78.440,4,"SET",2022-09-24 08:11:04 CEST,3/3706,0,LOG,00000,"execute <unnamed>: SET application_name = 'DataGrip 2022.2.4'",,,,,,,,,"","client backend",,0
2022-09-24 08:11:05.172 CEST,"X","Xdb",1088,"127.0.0.1:44432",632e9f78.440,5,"SHOW",2022-09-24 08:11:04 CEST,3/3708,0,LOG,00000,"execute <unnamed>: SHOW TRANSACTION ISOLATION LEVEL",,,,,,,,,"DataGrip 2022.2.4","client backend",,0
2022-09-24 08:11:05.232 CEST,"X","Xdb",1088,"127.0.0.1:44432",632e9f78.440,6,"REFRESH MATERIALIZED VIEW",2022-09-24 08:11:04 CEST,3/3709,0,LOG,00000,"execute <unnamed>: refresh materialized view dbo.view1",,,,,,,,,"DataGrip2022.2.4","client backend",,0
2022-09-24 08:11:06.236 CEST,,,1089,,632e9f79.441,1,,2022-09-24 08:11:05 CEST,4/274,6372,LOG,00000,"temporary file: path ""base/pgsql_tmp/pgsql_tmp1089.0"", size 3604480",,,,,,"refresh materialized view dbo.view1",,,"DataGrip 2022.2.4","parallel worker",1088,0
2022-09-24 08:11:06.237 CEST,,,1090,,632e9f79.442,1,,2022-09-24 08:11:05 CEST,5/147,6372,LOG,00000,"temporary file: path ""base/pgsql_tmp/pgsql_tmp1090.0"", size 3645440",,,,,,"refresh materialized view dbo.view1",,,"DataGrip 2022.2.4","parallel worker",1088,0
2022-09-24 08:11:06.263 CEST,"X","Xdb",1088,"127.0.0.1:44432",632e9f78.440,7,"REFRESH MATERIALIZED VIEW",2022-09-24 08:11:04 CEST,3/3709,6372,LOG,00000,"temporary file: path ""base/pgsql_tmp/pgsql_tmp1088.2"", size 4554752",,,,,,"refresh materialized view dbo.view1",,,"DataGrip 2022.2.4","client backend",,0
2022-09-24 08:11:06.742 CEST,"X","Xdb",1088,"127.0.0.1:44432",632e9f78.440,8,"REFRESH MATERIALIZED VIEW",2022-09-24 08:11:04 CEST,3/3709,6372,LOG,00000,"temporary file: path ""base/pgsql_tmp/pgsql_tmp1088.1"", size 10829824",,,,,,"refresh materialized view dbo.view1",,,"DataGrip 2022.2.4","client backend",,0
...
...
2022-09-24 08:11:07.072 CEST,"X","Xdb",1088,"127.0.0.1:44432",632e9f78.440,102,"REFRESH MATERIALIZED VIEW",2022-09-24 08:11:04 CEST,3/3709,6372,LOG,00000,"temporary file: path ""base/pgsql_tmp/pgsql_tmp1088.3"", size 10829824",,,,,,"refresh materialized view dbo.view1",,,"DataGrip 2022.2.4","client backend",,0
2022-09-24 08:11:07.072 CEST,"X","Xdb",1088,"127.0.0.1:44432",632e9f78.440,103,"REFRESH MATERIALIZED VIEW",2022-09-24 08:11:04 CEST,3/3709,6372,LOG,00000,"temporary file: path ""base/pgsql_tmp/pgsql_tmp1088.0"", size 3579904",,,,,,"refresh materialized view dbo.view1",,,"DataGrip 2022.2.4","client backend",,0

The only difference I spotted so far was that both Java-based and manual refresh had:

execute <unnamed>: REFRESH MATERIALIZED VIEW

whereas the one from cron-job only had:

statement: REFRESH MATERIALIZED VIEW

Update #3

2022-09-24 03:00:00.814 CEST,"X","Xdb",17112,"127.0.0.1:44060",632e5690.42d8,1,"SET",2022-09-24 03:00:00 CEST,3/2370,0,LOG,00000,"execute <unnamed>: SET extra_float_digits = 3",,,,,,,,,"","client backend",,0
2022-09-24 03:00:00.815 CEST,"X","Xdb",17112,"127.0.0.1:44060",632e5690.42d8,2,"SET",2022-09-24 03:00:00 CEST,3/2371,0,LOG,00000,"execute <unnamed>: SET application_name = 'PostgreSQL JDBC Driver'",,,,,,,,,"","client backend",,0
2022-09-24 03:00:00.815 CEST,"X","Xdb",17112,"127.0.0.1:44060",632e5690.42d8,3,"SHOW",2022-09-24 03:00:00 CEST,3/2372,0,LOG,00000,"execute <unnamed>: SHOW TRANSACTION ISOLATION LEVEL",,,,,,,,,"PostgreSQL JDBC Driver","client backend",,0
2022-09-24 03:00:00.815 CEST,"X","Xdb",17112,"127.0.0.1:44060",632e5690.42d8,4,"SET",2022-09-24 03:00:00 CEST,3/2373,0,LOG,00000,"execute <unnamed>: SET SESSION CHARACTERISTICS AS TRANSACTION ISOLATION LEVEL READ UNCOMMITTED",,,,,,,,,"PostgreSQL JDBC Driver","client backend",,0
...
After a while sync job completes
...
2022-09-24 03:06:35.311 CEST,"X","Xdb",17112,"127.0.0.1:44060",632e5690.42d8,77697,"COMMIT",2022-09-24 03:00:00 CEST,3/2374,6153,LOG,00000,"execute S_35: COMMIT",,,,,,,,,"PostgreSQL JDBC Driver","client backend",,0
2022-09-24 03:06:35.322 CEST,"X","Xdb",17112,"127.0.0.1:44060",632e5690.42d8,77698,"SET",2022-09-24 03:00:00 CEST,3/2375,0,LOG,00000,"execute <unnamed>: SET SESSION CHARACTERISTICS AS TRANSACTION ISOLATION LEVEL READ COMMITTED",,,,,,,,,"PostgreSQL JDBC Driver","client backend",,0
2022-09-24 03:06:35.355 CEST,"X","Xdb",17568,"127.0.0.1:44228",632e581b.44a0,1,"SET",2022-09-24 03:06:35 CEST,4/171,0,LOG,00000,"execute <unnamed>: SET extra_float_digits = 3",,,,,,,,,"","client backend",,0
2022-09-24 03:06:35.355 CEST,"X","Xdb",17568,"127.0.0.1:44228",632e581b.44a0,2,"SET",2022-09-24 03:06:35 CEST,4/172,0,LOG,00000,"execute <unnamed>: SET application_name = 'PostgreSQL JDBC Driver'",,,,,,,,,"","client backend",,0
2022-09-24 03:06:35.355 CEST,"X","Xdb",17571,"127.0.0.1:44234",632e581b.44a3,1,"SET",2022-09-24 03:06:35 CEST,6/17,0,LOG,00000,"execute <unnamed>: SET extra_float_digits = 3",,,,,,,,,"","client backend",,0
2022-09-24 03:06:35.355 CEST,"X","Xdb",17571,"127.0.0.1:44234",632e581b.44a3,2,"SET",2022-09-24 03:06:35 CEST,6/18,0,LOG,00000,"execute <unnamed>: SET application_name = 'PostgreSQL JDBC Driver'",,,,,,,,,"","client backend",,0
2022-09-24 03:06:35.355 CEST,"X","Xdb",17569,"127.0.0.1:44230",632e581b.44a1,1,"SET",2022-09-24 03:06:35 CEST,5/48,0,LOG,00000,"execute <unnamed>: SET extra_float_digits = 3",,,,,,,,,"","client backend",,0
2022-09-24 03:06:35.356 CEST,"X","Xdb",17569,"127.0.0.1:44230",632e581b.44a1,2,"SET",2022-09-24 03:06:35 CEST,5/49,0,LOG,00000,"execute <unnamed>: SET application_name = 'PostgreSQL JDBC Driver'",,,,,,,,,"","client backend",,0
2022-09-24 03:06:35.356 CEST,"X","Xdb",17569,"127.0.0.1:44230",632e581b.44a1,3,"BEGIN",2022-09-24 03:06:35 CEST,5/50,0,LOG,00000,"execute <unnamed>: BEGIN",,,,,,,,,"PostgreSQL JDBC Driver","client backend",,0
... The refresh statements go on and on

It turns out that the sync job runs in a transaction isolation level READ UNCOMMITTED, but view refresh tasks run at READ COMMITTED level

On the other hand, DataGrip does not even set the isolation level, but only queries it...

Jovan Perovic
  • 19,846
  • 5
  • 44
  • 85
  • Please share your complete statements. And do you log your statements? If so, did you check them? And if you REFRESH them WITH NO DATA, no data will be there. – Frank Heikens Sep 22 '22 at 18:54
  • Thanks, I have updated the question with a sample materialized view query. It looks rather typical, doesn't it? I did not use `NO DATA` at any point. I still have to turn on query logging, though I fear that it may put additional stress on the system during the sync process. – Jovan Perovic Sep 22 '22 at 19:02
  • That how to create one, but how do you refresh them? Without a REFRESH, you will still have the old data. https://www.postgresql.org/docs/current/sql-refreshmaterializedview.html – Frank Heikens Sep 22 '22 at 19:04
  • Ah, yes, of course, I execute: `REFRESH MATERIALIZED VIEW dbo.ind_newborn_daily;` – Jovan Perovic Sep 22 '22 at 19:10
  • And when the refresh is done, the mat view isn't refreshed? What do the logfiles tell you? Did the refresh run and was is successful or not? – Frank Heikens Sep 22 '22 at 19:25
  • 2
    Either the materialized view is not refreshed, or the refresh started before the data modifications were committed. – Laurenz Albe Sep 22 '22 at 19:36
  • Maybe there are uncommited changes. Make sure the java process succesfully commited and closed connection. Strongly advise you to create a column on mw to keep last refresh time of records. – bekir çelik Sep 22 '22 at 20:23
  • Thanks, folks! I have updated the config of Postgres to log everything. Looking forward to the next run and seeing what actually executes. I am going to update the question once I have more info :) – Jovan Perovic Sep 23 '22 at 14:57
  • I have captured the logs and updated the question with them. I am still unable to figure out why it does not work :-/ – Jovan Perovic Sep 24 '22 at 19:39
  • Well, I changed the code for a refresh so it works in a single thread. I am also starting a transaction with isolation level `READ UNCOMMITTED` (though I really should not need to), but the result stays the same :( – Jovan Perovic Sep 25 '22 at 13:40
  • Somethings off here. After a job went through (and produced an inconsistent view), I disabled a sync and left only refreshes enabled. No matter how many times I ran the job (well, the refresh) I could not get the view to refresh :( – Jovan Perovic Sep 25 '22 at 14:03
  • How you can be sure if there aren't any new data here? Your query has date filter and your source system may update it, than you cant be sure. Also there may be time zone difference on source systems, your client, your session etc.. Check timezone and set timezone before test and refresh matview. – bekir çelik Sep 27 '22 at 02:14
  • @bekirçelik I am querying tables on my side (target, not source) and I am getting different results when executing VIEW's SQL vs VIEW itself :-/ – Jovan Perovic Sep 27 '22 at 20:01
  • Strange . Lets simulate it. I think its impossible but worth to test it. Create 2 table. 1.st table query :CREATE TABLE tab1 AS SELECT * from view1 . 2.nd table query CREATE TABLE tab2 as ( wiev sql comes here). And you say these two dataset gives different result right? – bekir çelik Sep 27 '22 at 20:11
  • @bekirçelik I have been trying for 2 days now and any combination of DataGrip/psql ALWAYS refreshes the dummy materialized view correctly. I guess, we expected as much... bummer... – Jovan Perovic Oct 01 '22 at 20:41
  • In addition, a few days ago, I set the TRACE logging of any transactions and I can confirm that all transactions are starting and completing (basically the application logs match the Postgres logs from above) – Jovan Perovic Oct 01 '22 at 21:10

2 Answers2

1

Answer you are looking for is that materialized views are not updated immediately. They are updated in the background. You can try to run the following query to see if the materialized view is still updating:

SELECT pg_is_in_recovery(); -- returns true if the server is a standby server

SELECT pg_last_xact_replay_timestamp(); -- returns the timestamp of the last transaction that was replayed on the standby server

SELECT pg_last_xact_replay_timestamp() > pg_last_xact_replay_timestamp(); -- returns true if the last transaction that was replayed on the standby server is newer than the last transaction that was replayed on the primary server
Tibic4
  • 3,709
  • 1
  • 13
  • Thank you for the hint TIbic4, but this is the primary server and I do not have a secondary. I have a production source (MySQL) and my own Postgres... – Jovan Perovic Oct 01 '22 at 20:44
1

Edit:

I realize that my "solution" is kinda hacky and not really correct. Frustration with this problem caused me to jump to conclusion, but it is in fact invalid.

As stated here: https://wiki.postgresql.org/wiki/Don't_Do_This#Don't_use_timestamp_(without_time_zone)

I later realized that my Linux box is set (and Postgres) in CET TZ, however, all my timestamps are in UTC and my clients use UTC.

Original answer:

After several months of the original question, I can finally put an end to this. Up until now, every single morning, I had to refresh views manually in order to work. Finally, I dedicated some time to investigating and solving this.

Root cause:

  • The database server is running in the CET timezone
  • The application server is running in the CET timezone
  • All clients (JetBrains Data Grip / Tableau) are running in UTC timezone
  • Application server executed code against materialized views
  • Tableau executed arbitrary SQL
  • I stored all datetimes WITH TIMEZONE
  • All events happing after 11 PM, were pushed to the next day...

I heard this one before: Always store the datetimes in UTC (or without timezone). But mistakes happen.

I guess this is a meme-worthy mistake:

enter image description here

Thank you all for the elaborate discussion and for bearing with at the time non-sense "facts" :)

Jovan Perovic
  • 19,846
  • 5
  • 44
  • 85