0

I extended JDBC adapter and used a model.json configuration custom schema factory with 1 original schema and 2 derived schemas to add rules and that worked, rules got executed on original schema during planning, but their end-result didn't get chosen as the best option by the Volcano planner because it's too expensive. Rules transformed RelNode to execute on 2 derived schemas. More details below and in code.

1) Can I tell Volcano planner to ignore 1 out of 3 schemas that I passed through custom JDBC SchemaFactory?

I want the parser to work on that 1 original schema, but for the planner to never suggest an optimal (cheapest) plan in that schema (only other 2 derived schemas). 1 original schema is always mapped 1-to-1 with other 2 derived schemas, so the RelNode that my rule returns is always semantically equivalent, just more expensive (security reasons).

2) If that can't work, how can I call HepPlanner instead of default Volcano planner from SchemaFactory that is set in model.json, since that's my starting point?

You can find my entire code on GitHub, I made it publicly available so that everyone can have a better starting point with Calcite than I did.

Here is the link: https://github.com/igrgurina/multicloud_rewriter

Calcite library is amazing, but it's really hard to get into because it lacks examples and tutorials for common tasks.

Ideally, I would have HepPlanner execute my rules that transform them to semantically equivalent expressions that use 2 derived schemas instead of 1 original schema (I have a rule that does that), and then have Volcano planner optimize that using only 2 derived schemas, without having an idea that 1 original schema exists, due to security reasons.

I haven't found any reasonable examples that demonstrate how to do that so any help would be appreciated (please don't post links to Druid example, or Apache Calcite docs website, I went through them a thousand times).

igrgurina
  • 50
  • 7

1 Answers1

0

I've managed to make this work by using Hook.PROGRAM and prepending my custom program that executes my rules before all others.

Since Hook is marked as for testing and debugging only in Calcite library, I would say this is not how it's supposed to be done, but I have nothing better at the moment.

Here is a short summary with code sample:

public static class MultiCloudHookManager {
    private static final Program PROGRAM = new MultiCloudProgram();

    private static Hook.Closeable globalProgramClosable;

    public static void addHook() {
        if (globalProgramClosable == null) {
            globalProgramClosable = Hook.PROGRAM.add(program());
        }
    }

    private static Consumer<Holder<Program>> program() {
        return prepend(PROGRAM);
    }

    // this doesn't have to be in the separate program
    private static Consumer<Holder<Program>> prepend(Program program) { 
        return (holder) -> {
            if (holder == null) {
                throw new IllegalStateException("No program holder");
            }
            Program chain = holder.get();
            if (chain == null) {
                chain = Programs.standard();
            }
            holder.set(Programs.sequence(program, chain));
        };
    }
}

The MultiCloudHookManager is then used in SchemaFactory, where you simply call MultiCloudHookManager.addHook() method. In this case, MultiCloudHookManager.PROGRAM is set to MultiCloudProgram, that simply executes a set of rules in HepPlanner.

For full details, refer to the source code in GitHub repository.

This hack solution is inspired by another library.

igrgurina
  • 50
  • 7