This is the server i am running
select version();
version
---------------------------------------------------------------------------
PostgreSQL 10.6 on x86_64-pc-linux-gnu, compiled by gcc (GCC) 4.8.5 20150623 (Red Hat 4.8.5-36), 64-bit
(1 row)
I started by writing the select (ext.t_event and ext.t_event_data are two foreign table that oracle_fdw (version 1.1) takes from a remote oracle db)
select
te.id_data,
te.id_device,
te.date_write,
te.date_event,
ted.i_inout,
ted.value
from ext.t_event te, ext.t_event_data ted
where te.id_device =2749651
and te.date_event >= '2019-01-16'and te.date_event < '2019-01-17'
and te.id_data=ted.id_data;
Takes about 10 seconds to fetch the entire recordset (3600 records).
But then i turned the select into a insert select
insert into stg_data
select
te.id_data,
te.id_device,
te.date_write,
te.date_event,
ted.i_inout,
ted.value
from ext.t_event te, ext.t_event_data ted
where te.id_device =2749651
and te.date_event >= '2019-01-16'and te.date_event < '2019-01-17'
and te.id_data=ted.id_data;
and i was forced to kill the query, it had been running for more than 30 minutes!
After a few hours of struggle and desperate attempts i decided to try this one
insert into stg_data
select
te.id_data,
te.id_device,
te.date_write,
te.date_event,
ted.i_inout,
ted.value
from ext.t_event te, ext.t_event_data ted
where te.id_device =2749651
and te.date_event >= '2019-01-16'and te.date_event < '2019-01-17'
and te.id_data=ted.id_data
limit 5000;
and...surprise surprise in 20 seconds i had the entire recordset stored in stg_data.
To better understand the differences i decided to analyze the plans.
SELECT NO LIMIT
Foreign Scan (cost=10000.00..20000.00 rows=1000 width=548)
Oracle query: SELECT /*eb01c463a72c3b6350f86f5db25e1353*/ r1."ID_DATA",
r1."ID_DEVICE", r1."DATE_WRITE", r1."DATE_EVENT", r2."I_INOUT",
r2."VALUE" FROM ("DISPATCH"."T_EVENT" r1 INNER JOIN
"DISPATCH"."T_EVENT_DATA" r2 ON (r1."ID_DATA" = r2."ID_DATA") AND
(r1."DATE_EVENT" >= (CAST ('2019-01-16 00:00:00.000000 AD' AS
TIMESTAMP))) AND (r1."DATE_EVENT" <
(CAST ('2019-01-17 00:00:00.000000 AD' AS TIMESTAMP)))
AND (r1."ID_DEVICE" = 2749651))
SELECT WITH LIMIT
Limit (cost=10000.00..20000.00 rows=1000 width=548)
-> Foreign Scan (cost=10000.00..20000.00 rows=1000 width=548)
Oracle query: SELECT /*eb01c463a72c3b6350f86f5db25e1353*/
r1."ID_DATA", r1."ID_DEVICE", r1."DATE_WRITE", r1."DATE_EVENT",
r2."I_INOUT", r2."VALUE" FROM ("DISPATCH"."T_EVENT" r1 INNER
JOIN "DISPATCH"."T_EVENT_DATA" r2 ON (r1."ID_DATA" = r2."ID_DATA")
AND (r1."DATE_EVENT" >= (CAST ('2019-01-16 00:00:00.000000 AD' AS
TIMESTAMP))) AND (r1."DATE_EVENT" < (CAST ('2019-01-17
00:00:00.000000 AD' AS TIMESTAMP))) AND (r1."ID_DEVICE" = 2749651))
So it basically send the same query to Oracle and applies the FILTER locally as soon as the fetch is completed.
Do the INSER-SELECT plan look the same? NOPE!
INSERT_SELECT with LIMIT
Insert on stg_data_hist (cost=10000.00..20010.00 rows=1000 width=548)
-> Limit (cost=10000.00..20000.00 rows=1000 width=548)
-> Foreign Scan (cost=10000.00..20000.00 rows=1000 width=548)
Oracle query: SELECT /*eb01c463a72c3b6350f86f5db25e1353*/
r1."ID_DATA", r1."ID_DEVICE", r1."DATE_WRITE",
r1."DATE_EVENT", r2."I_INOUT", r2."VALUE" FROM
("DISPATCH"."T_EVENT" r1 INNER JOIN
"DISPATCH"."T_EVENT_DATA" r2 ON (r1."ID_DATA" =
r2."ID_DATA") AND (r1."DATE_EVENT" >= (CAST ('2019-01-16
00:00:00.000000 AD' AS TIMESTAMP))) AND (r1."DATE_EVENT" <
(CAST('2019-01-17 00:00:00.000000 AD' AS TIMESTAMP))) AND
(r1."ID_DEVICE" = 2749651))
INSERT-SELECT no LIMIT clause
Insert on stg_data_hist (cost=30012.50..40190.00 rows=5000 width=548)
-> Hash Join (cost=30012.50..40190.00 rows=5000 width=548)
Hash Cond: (te.id_data = ted.id_data)
-> Foreign Scan on t_event te (cost=10000.00..20000.00 rows=1000 width=28)
Oracle query: SELECT /*93379c271b3f1bc08a1dbb94fb89f739*/
r3."ID_DATA", r3."ID_DEVICE", r3."DATE_WRITE", r3."DATE_EVENT"
FROM "DISPATCH"."T_EVENT" r3 WHERE (r3."DATE_EVENT" >=
(CAST ('2019-01-16 00:00:00.000000 AD' AS TIMESTAMP))) AND
(r3."DATE_EVENT" < (CAST ('2019-01-17 00:00:00.000000 AD' AS
TIMESTAMP))) AND (r3."ID_DEVICE" = 2749651)
-> Hash (cost=20000.00..20000.00 rows=1000 width=528)
-> Foreign Scan on t_event_data ted
(cost=10000.00..20000.00 rows=1000 width=528)
Oracle query: SELECT /*21c8741f2fa8a8d13d037c3191e8ac96*/
r4."ID_DATA", r4."I_INOUT", r4."VALUE" FROM
"DISPATCH"."T_EVENT_DATA" r4
And that explains why it takes waaaaaay longer than the other. It's retrieving the date-filtered records from one foreign table, the complete set from the second foreign table and doing the join locally. That is going to take ages!! It's several milion of records vs few thousands.
And finally my two question
1) I want to have the first plan but get rid of the LIMIT clause (send shivers down my spine :-) ). How would you do that? I have no mean to apply filters on the ext.t_event_data except from the join clause.
2) Why the two INSERT-SELECT plan look so different even though the two SELECT plan look so similar?
Thanks for reading and have a nice day
SAMPLE_PERCENT '0.000001'
, analyzed both tables, but same plan. Did the same process again with a bigger sample percent and yet could not convince postgres to send over the join to Oracle!!! – user1375452 Jan 18 '19 at 14:01