Extremely slow Apache Drill Query using Oracle jdbc

Question

I've successfully set up Apache Drill (latest 1.9) with the Oracle's JDBC client (latest ojdbc7.jar from oracle) as a storage plugin:

{
  "type": "jdbc",
  "driver": "oracle.jdbc.driver.OracleDriver",
  "url": "jdbc:oracle:thin:@server:1521/myservicename",
  "username": "TEST_USER",
  "password": "password",
  "enabled": true
}

I can do queries, but even the simplest things literally takes minutes to execute. Querying a single-row-table takes 78 seconds. And just setting the default schema takes over a minute:

0: jdbc:drill:zk=local> use oracle.TEST_USER
. . . . . . . . . . . > ;
+-------+-------------------------------------------------+
|  ok   |                     summary                     |
+-------+-------------------------------------------------+
| true  | Default schema changed to [oracle.TEST_USER]  |
+-------+-------------------------------------------------+
1 row selected (77,5 seconds)

But I've tested with a simple hello-world-style java application that the connection with the same connection string works absolutely fine with full table fetching times of about 0,1 second.

jdbc:oracle:thin:@server:1521/myservicename

I've tested from an Ubuntu and a Mac and also tried setting the random-source for java as other answers point out as potential performance problems:

export DRILL_JAVA_OPTS="$DRILL_JAVA_OPTS -Djava.security.egd=file:/dev/./urandom"

What's going on here? Is this a known issue or is there some workaround?

I've used drill-embedded.

Activate Oracle 10046 trace to see what part of the time is elapsed in the database. — Marmite Bomber, Jan 23 '17 at 20:37

score 2 · Answer 1 · answered Jan 24 '17 at 18:21

I was able to find the root cause of this problem.

The problem is not a slow query, but a non-optimal prefetching strategy in Drill's metadata queries. In my case the Database is huge and has thousands of schemas (oracle users) with each hundreds of tables. And oracles default fetchsize is 10 resulting in hundreds of DB roundtrips.

Each Apache Drill JDBC query is examining the metadata.

Inside Apache Drill: JdbcStoragePlugin.java#L351:

java.sql.DatabaseMetaData.getSchemas()

Inside Apache Calcite (Drill dependency): JdbcMeta.java#L323:

java.sql.DatabaseMetaData.getTables(...)

Both parts don't override any default fetchsize, and via wireshark I can literally watch the packages drop in step by step each with only 10 rows. (The latency to the DB server is quite high as located somewhere else)

I already recompiled Apache Drill with a manual call to setFetchSize which improved response times a lot. I didn't patch Calcite so far but will probably do as well.

In general I think that also while fetching Metadata the regular steps for writing performant JDBC code should be taken into account, as real-world-scenarios can easily end up with bigger metadatas (e.g. more than 10 tables or schemas) Another idea would be caching but I haven't seen any metadata-caching going on in Drill's JDBC storage plugin.

Hi Thomas, can you share the exact setFetchSize code that needs to be added in JdbcStoragePlugin.java? — Srihari Karanth, Feb 13 '17 at 09:18
I have `export DRILLBIT_JAVA_OPTS="$DRILLBIT_JAVA_OPTS -Doracle.jdbc.defaultRowPrefetch=50000"` in my drill-env.sh — Dzamo Norton, Apr 19 '21 at 12:39

score 1 · Answer 2 · edited Mar 17 '21 at 06:18

1

For the ones having the same issue and trying to find an answer.

I updated the oracle jar file /oracle/jdbc/defaultConnectionProperties.properties

and added oracle.jdbc.defaultRowPrefetch=200

I used 7zip to open and edit the file content

Note: you need to find the best rowPrefetch for your case

I read this to give me an idea.

edited Mar 17 '21 at 06:18

tomerpacific

4,704
13
34
52

answered Mar 17 '21 at 04:08

Daniel

11
1

Extremely slow Apache Drill Query using Oracle jdbc

2 Answers2