2

I am using the "spark" connector in Power Bi desktop application to connect to spark thrift server using "HTTP" connection mode with "SSL" enabled and "Direct Query" option.

enter image description here

The connection went successfully however, there is a too long time latency approximately 10 minutes on every communication between Power Bi app and the Spark Thrift Server (e.g. authentication, loading meta-data, loading a specific table, ...etc). The source of the data is "Hive Server2".

My Spark Thrift Config in "hive-default.xml" is as follows:

<property>
    <name>hive.server2.authentication</name>
    <value>PAM</value>
</property>
<property>
    <name>hive.server2.authentication.pam.services</name>
    <value>login,sudo,sshd</value>
</property>
<property>
    <name>hive.server2.use.SSL</name>
    <value>true</value>
</property>
<property>
    <name>hive.server2.keystore.path</name>
    <value>************</value>
</property>
<property>
    <name>hive.server2.keystore.password</name>
    <value>************</value>
</property>
<property>
    <name>hive.server2.transport.mode</name>
    <value>http</value>
</property>
<property>
    <name>hive.server2.thrift.http.port</name>
    <value>10001</value>
</property>
<property>
     <name>hive.server2.thrift.http.path</name>
     <value>cliservice</value>
</property>

Firstly, I suspected that the "SSL" encryption is slowing the communication and I disabled it but, the issue still persisted. Therefore, I concluded that it is irrelevant to this issue.

Any ideas regarding how to enhance communication speed?

Note: I tested to connect to the Spark Thrift server using "beeline" command line tool with "HTTP" mode enabled and "SSL" encryption and it went very fast, so I eliminated also any possibility of network latency issues.

DigitalFox
  • 1,486
  • 1
  • 13
  • 17

0 Answers0