0

Current Architecture:

In our current architecture, we are using Hadoop File System to store our reporting data in Parquet file format and we are Spark Thrift Server to expose these data through a SQL interface (JDBC/ODBC). Using this interface, SAP BO is configured to pull data for the reports.

BO Version : SAP BO 4.2 SP07

Spark Version : 2.1.0

Current Behaviour:

We have a medium grade query being generated by one of our report with 28 joins (shown in query panel).

But when the report is run in Webi, we see a different query is being triggered (core logic is same) in Spark with complex and long string alias are being added to each field, table and join (snippet of two columns are below).

`SYNTHJOIN_9445517e`.`SYNTHJOIN_e7712c1dSYNTHJOIN_ae302cd9bi_financial_timeFINANCIAL_DATE` `_SYNTHJOIN_9445517e___SYNTHJOIN_e7712c1dSYNTHJOIN_ae302cd9bi_financial_timeFINANCIAL_DATE_`,

`SYNTHJOIN_9445517e`.`SYNTHJOIN_e7712c1dSYNTHJOIN_ae302cd9SYNTHJOIN_f7386c62SYNTHJOIN_12a564e7SYNTHJOIN_e7907874SYNTHJOIN_151b5d6fSYNTHJOIN_cc7708b9SYNTHJOIN_9a61ffc9SYNTHJOIN_425d41fbbi_marketMARKET_LEVEL_1` `_SYNTHJOIN_9445517e___SYNTHJOIN_e7712c1dSYNTHJOIN_ae302cd9SYNTHJOIN_f7386c62SYNTHJOIN_12a564e7SYNTHJOIN_e7907874SYNTHJOIN_151b5d6fSYNTHJOIN_cc7708b9SYNTHJOIN_9a61ffc9SYNTHJOIN_425d41fbbi_marketMARKET_LEVEL_1_`

Questions:

  1. Can I please understand the reason behind on why long alias is being added in the query?
  2. Since Spark 2.1.0 supports only 64k bytes of value for a query (https://issues.apache.org/jira/browse/SPARK-19796), the query generated based on this query crosses this 64K value. So spark jobs are getting failed. We would like to reduce and keep these alias simple and crisp to avoid this 64K barrier. Please suggest any configuration that we are missing in SAP BO?
  3. Also we notice that SAP BO is generating a lengthy sub query rather than simple join query, is there any configuration in SAP BO to fire the query shown in query panel to the database rather than exploding the query?

1 Answers1

0

As we were able to solve it, I would like to answer to my question.

  1. Can I please understand the reason behind on why long alias is being added in the query?

This was due to the behaviour of the Spark Simba Driver provided by SAP BO. SAP BO provided a fixed Driver, post which all queries were normal without any additional alias.

  1. Since Spark 2.1.0 supports only 64k bytes of value for a query (https://issues.apache.org/jira/browse/SPARK-19796), the query generated based on this query crosses this 64K value. So spark jobs are getting failed. We would like to reduce and keep these alias simple and crisp to avoid this 64K barrier. Please suggest any configuration that we are missing in SAP BO?

We also got the fix from Apache Spark 2.2 to support more than 64K bytes of query.

  1. Also we notice that SAP BO is generating a lengthy sub query rather than simple join query, is there any configuration in SAP BO to fire the query shown in query panel to the database rather than exploding the query?

Even without the fix from SAP (fixed Driver jar), we were able to fix this by updating the spark simba SBO file to pick the prm file of Generic JDBC driver available in {BO Installation Path}/sap_bobj/enterprise_xi40/dataAccess/connectionServer/jdbc/extensions/qt/jdbc.prm. But SAP said, they won't be supporting this workaround, so we got the fix from SAP itself as mentioned above.