1

I am trying to analyse how sql queries are generated by Pentaho mondrian. Let us assume there are no aggregate tables as of now. I have noticed two types of behaviour when I try to fetch data from data warehouse (star schema) using Pentaho.

Case 1: I apply various filters and try to get fact count corresponding to it which is the default measure in my case.

Case 2: I apply the same filters as mentioned in case 1 and try to get some other measure by explicitly putting it into the measures selection box.

Observation: In both the cases, sql queries generated in the back-end include joins of fact table with multiple dimension tables as per the filters applied and columns and rows selected in Pentaho. However, the join order is different in both the cases. In case 1, the fact table is placed at the left-most position of join whereas it is placed somewhere between the dimension tables in case 2.

I have connected Pentaho with AWS Athena at the back-end to execute queries on data stored on s3 with the help of jdbc connection. Since Athena has Presto at the back-end and Presto does not do automatic JOIN re-ordering, queries in case 2 are getting failed. (http://docs.qubole.com/en/latest/user-guide/presto/best-practices.html)

I noticed that hash joins are being performed by Presto here. For hash joins to be effective, the largest table should be placed on the left side of join so that the smaller table is cached in memory while performing join. This is not happening in second case and it is trying to hash the fact table which consists of a large amount of data as compared to any of the dimension tables. This causes the query to fail whenever I add measure explicitly (other than default measure) and the data range is large (across an year for example).

Can someone please give an insight into the logic behind query formation of Mondrian in both the cases. Also, is there a way we can make the fact table to always remain on the left-most position of joins in the sql queries generated by Mondrian. Or is there any property of Presto which could be set through Athena to change the join type from hash join to some other type of join in which could solve this problem.

Pentaho version - 6.1.0
Saiku version - 3.10

karu07
  • 126
  • 8
  • Please read https://stackoverflow.com/questions/35461812/apache-phoenix-pentaho-mondrian-wrong-join-order – AlainD Sep 25 '17 at 07:42
  • Where did you read Mondrian was doing hash joins? And which version of Mondrian are you refering to? I was under the impression it was converting an MDX query into an sql, and give that to the DBMS to optimize it. – AlainD Sep 25 '17 at 07:45
  • @AlainD my bad. I skipped that I have kept Presto at the back-end and hash joins are performed in Presto. Mondrian is indeed involved in the conversion part only. I have updated the question. Thanks for pointing it out. – karu07 Sep 25 '17 at 09:10
  • This seems relevant here - https://github.com/prestodb/presto/issues/3364 – karu07 Sep 25 '17 at 10:05
  • could you solve the problem? – Viktor Klyestov Jul 05 '22 at 09:51

0 Answers0