1

Just want to check if the Apache Calcite can be used for the use case "Data Federation"(query with multiple databases).

The idea is I have a master query (5 tables) that has tables from one database (say Hive) and 3 tables from another database (say MySQL).

  • Can I execute master query on multiple database from one JDBC Client interface ?
  • If this is possible; where the query execution (particularly inter database join) happens?
  • Also, can I get a physical plan from Calcite where I can execute explicitly in another execution engine?

I read from Calcite documentation that it can push down Join and GroupBy but I could not understand it? Can anyone help me understand this?

Ram Ghadiyaram
  • 28,239
  • 13
  • 95
  • 121
Sathish
  • 73
  • 7
  • Hi, Could you find any answer to the question? I have similar objective i.e. Data Federation. – phoenix Nov 16 '15 at 08:20
  • Nope, I haven't found any answer.. I think the application has to implement all the functionalities through relational algebra provided by Calcite. My understanding is Calcite does not provide data federation (query decomposition) out of the box – Sathish Dec 07 '15 at 16:56

1 Answers1

3

I will try to answer. you can as well send questions to the mailing list. dev@calcite.apache.org you are more likely get answer there.

Can I execute master query on multiple database from one JDBC Client interface ? If this is possible; where the query execution (particularly inter database join) happens?

yes, you can. the Inter database join happens in your memory where calcite runs.

Can I get a physical plan from Calcite where I can execute explicitly in another execution engine?

yes, you can. a lot of calcite consumers are doing this way. but you will have to wrap around the calcite rule system, I mean excute

I read from calcite documentation that it can push down Join and GroupBy but I could not understand it? Can anyone help me understand this?

these are the SQL optimisations that the engine does. imagine a groupBy which could have happened on a tiny table but actually specified after joining with a huge table.

zinking
  • 5,561
  • 5
  • 49
  • 81
  • 2
    Do you have any examples that show this? I am very much interested in this scenario (federated querying of two relational sources). – Edmon Sep 12 '17 at 20:26