Current Architecture:
In our current architecture, we are using Hadoop File System to store our reporting data in Parquet file format and we are Spark Thrift Server to expose these data through a SQL interface (JDBC/ODBC). Using this interface, SAP BO is configured to pull data for the reports.
BO Version : SAP BO 4.2 SP07
Spark Version : 2.1.0
Current Behaviour:
We have a medium grade query being generated by one of our report with 28 joins (shown in query panel).
But when the report is run in Webi, we see a different query is being triggered (core logic is same) in Spark with complex and long string alias are being added to each field, table and join (snippet of two columns are below).
`SYNTHJOIN_9445517e`.`SYNTHJOIN_e7712c1dSYNTHJOIN_ae302cd9bi_financial_timeFINANCIAL_DATE` `_SYNTHJOIN_9445517e___SYNTHJOIN_e7712c1dSYNTHJOIN_ae302cd9bi_financial_timeFINANCIAL_DATE_`,
`SYNTHJOIN_9445517e`.`SYNTHJOIN_e7712c1dSYNTHJOIN_ae302cd9SYNTHJOIN_f7386c62SYNTHJOIN_12a564e7SYNTHJOIN_e7907874SYNTHJOIN_151b5d6fSYNTHJOIN_cc7708b9SYNTHJOIN_9a61ffc9SYNTHJOIN_425d41fbbi_marketMARKET_LEVEL_1` `_SYNTHJOIN_9445517e___SYNTHJOIN_e7712c1dSYNTHJOIN_ae302cd9SYNTHJOIN_f7386c62SYNTHJOIN_12a564e7SYNTHJOIN_e7907874SYNTHJOIN_151b5d6fSYNTHJOIN_cc7708b9SYNTHJOIN_9a61ffc9SYNTHJOIN_425d41fbbi_marketMARKET_LEVEL_1_`
Questions:
- Can I please understand the reason behind on why long alias is being added in the query?
- Since Spark 2.1.0 supports only 64k bytes of value for a query (https://issues.apache.org/jira/browse/SPARK-19796), the query generated based on this query crosses this 64K value. So spark jobs are getting failed. We would like to reduce and keep these alias simple and crisp to avoid this 64K barrier. Please suggest any configuration that we are missing in SAP BO?
- Also we notice that SAP BO is generating a lengthy sub query rather than simple join query, is there any configuration in SAP BO to fire the query shown in query panel to the database rather than exploding the query?