there is a service API that allows retrieving some Book
s, and we would like to build a Apache Calcite table on top of this service; this API only accepts a set of IDs, and if no IDs are provided, no result is returned.
interface BookService {
/** Returns the books matching the given IDs.
*
* <p>If not IDs is provided, no result is returned.
*/
List<Book> findBooksByIds(Set<String> ids);
}
record Book(String id, String title) {};
A requirement of this table is that it has to support join, and we would like to support joining by ID efficiently; but currently this is not possible, because in case of joins, no filters is pushed down to this API; for example, given the query:
WITH config (id, val) AS (
VALUES ('a', 3), ('b', 5)
)
SELECT * FROM books b
INNER JOIN config ON b.id = config.id
Calcite produces the following plan:
LogicalProject(id=[$6], val=[$7], id=[$0], author=[$1])
LogicalJoin(subset=[rel#58:RelSubset#6.NONE.[]], condition=[=($8, $5)], joinType=[inner])
LogicalProject(subset=[rel#34:RelSubset#3.NONE.[]], id=[$0], author=[$1], id0=[CAST($2):VARCHAR])
LogicalTableScan(subset=[rel#32:RelSubset#2.NONE.[]], table=[[s, books]])
LogicalProject(subset=[rel#31:RelSubset#1.NONE.[]], EXPR$0=[$0], EXPR$1=[$1], EXPR$00=[CAST($0):VARCHAR NOT NULL])
LogicalValues(subset=[rel#29:RelSubset#0.NONE.[]], tuples=[[{ 'a', 3 }, { 'b', 5 }]])
As you can see, this always results in an empty result, since the join condition is not transformed to filters that can be pushed down to the TableScan
.
The goal is to define a rule that forces the query planner to always push down join predicates into a table scan for this books
table.
Postgres, under some circumstances, generates nested-loop join plan: it first selects the rows of the table A matching a given condition, then iterates over the retrieved rows and performs a scan of table B looking for rows that match the join condition; and this seems something we would like to enforce here. Checking the set of available rules in Apache Calcite, I see the EnumerableBatchNestedLoopJoin
rule, and it seems this does exactly what we want.
Some questions:
- Is this rule the right place for looking for a solution, or do you have other suggestions?
- How can we re-use this rule for the
books
table, where its access is not allowed by a full table scan (checking the test for this rule in thecalcite
GitHub repository, it seems that all the used tables allow a full table scan)?
Thanks!