Teiid not performing optimal join

Question

For our Teiid Springboot project we use a row filter in a where clause to determine what results a user gets. Example:

SELECT * FROM very_large_table WHERE id IN ('01', '03')

We want the context in the IN clause to be dynamic like so:

SELECT * FROM very_large_table WHERE id IN (SELECT other_id from very_small_table)

The problem now is that Teiid gets all the data from very_large_table and only then tries to filter with the where clause, this makes the query 10-20 times slower. The data in this very_small_tableis only about 1-10 records and it is based on the user context we get from Java.

The very_large_table is located on a Oracle database and the very_small_table is on the Teiid Pod/Container. Somehow I can't force Teiid to ship the data to Oracle and perform filtering there.

Things that I have tried: I have specified the the foreign data wrappers as follows

CREATE FOREING DATA WRAPPER "oracle_override" TYPE "oracle" OPTIONS (EnableDependentsJoins 'true');
CREATE SERVER server_name FOREIGN DATA WRAPPER "oracle_override";

I also tried, exists statement or instead of a where clause use a join clause to see if pushdown happened. Also hints for joins don't seem to matter.

Sadly the performance impact at the moment is that high that we can't reach our performance targets.

score 0 · Answer 1 · answered Nov 27 '20 at 14:33

0

Often, exists performs better than in:

SELECT vlt.*
FROM very_large_table vlt
WHERE EXISTS (SELECT 1 FROM very_small_table vst WHERE vst.other_id = vlt.id);

However, this might end up scanning the large table.

If id is unique in vlt and there are no duplicates in vst, then a JOIN might optimize better:

select vlt.*
from very_small_table vst join
     very_large_table vlt
     on vst.other_id = vlt.id;

answered Nov 27 '20 at 14:33

Gordon Linoff

1,242,037
58
646
786

Thanks for your reply. However, I already tried exists without much effect. Instead of 5.4 seconds I indeed get 5 seconds but that is negligible. The performance of the query with hardcoding is 0.4 seconds. Joins tend to be a bit slower scoring between 6-7 seconds. This is only acceptance data, so I expect an even larger dataset on production. – Kusters Nov 27 '20 at 14:47

score 0 · Accepted Answer · answered Dec 02 '20 at 22:02

0

Are there any cardinalities on very_small_table and very_large_table? If not the planner will assume a default plan.

You can also use a dependent join hint:

SELECT * FROM very_large_table WHERE id IN /*+ dj */ (SELECT other_id from very_small_table)

answered Dec 02 '20 at 22:02

Steven Hawkins

538
1
4
7

On the larger tables there are, but they might not yet be totally correct since they assume dev data. De the very small table is a texttable with the user's roles in it. I gave it a cardinality of 5 (which is the expected max). I solved the problem partly; reason it was not working correctly was that I was testing the queries in Squirrel which has a different behaviour than the DDL. Somehow the dependant IN join is faster after 2500 rows but everything below that a unnest hint for the IN faster. It greatly depends on the scenario how many rows we need. Can we get the best of both worlds? – Kusters Dec 09 '20 at 10:12
Is there a way to force the same behaviour as the hardcoded IN clause? Because the amount of data will be similar and this will make it a whole lot faster. With cost statistics of Prod it now always ignores hints :(. – Kusters Dec 09 '20 at 12:38
Without seeing the particulars of the query plans it's hard to say what the best of both worlds actually is. The plans would also clarify what the difference is with a hardcoded in - for example using the dj hint should produce a plan that is nearly equivalent to hardcoded in it just has to pre-evaluate the value set and I think there is an extra buffering. I'm also not sure what you mean by ignoring hints. What plan are you getting, and what plan do you expect? – Steven Hawkins Dec 10 '20 at 16:59
We are going to test today if we see some performance improvements with this setup. If this does not pan out I will post an anonymized plan in the opening post. I will need to change the cardinalities to show what I meant; but I hope the performance is already oke. – Kusters Dec 14 '20 at 12:49
https://issues.redhat.com/browse/TEIID-6058 addresses the primary planning issue that you are seeing based upon the query plans you provided. – Steven Hawkins Dec 16 '20 at 03:39
Thanks, I see that it is using Teiid 16, the springboot starter we use (1.6.0) is using Teiid 15. For the null pointer I got when using with exists has the issue described here: https://issues.redhat.com/browse/TEIID-6060 Which I created. – Kusters Dec 16 '20 at 14:37

Teiid not performing optimal join

2 Answers2