0

Problem

  1. Iterating over a large ResultSet from a Presto query in using JDBC Driver is extremely time consuming. Each batch call seems to take almost exactly 60s, which is suspicious (caused by some timeout maybe?).
  2. For some reason, the initial executeQuery method takes almost exactly 45s, which is suspicious.

Query

I'm executing a simple query against Presto using Java JDBC Presto Driver that looks essentially like:

SELECT stringA, stringB 
FROM {table} 
LIMIT 500000

stringA and stringB are small - on the order of ~10 characters each.
My query completes under 10sec when run in DbVisualizer using a teradata driver.
BUT when I run the same query from a Spring Java app using the 0.230 presto-jdbc driver it seems to get results back in batches (of about 75,000) and each batch takes over a minute to return.

I've done some reading about Presto's targetResultSize query param but I haven't been able to set it using the JDBC driver / connection. I've read that presto by default will only return data 1MB at a time? Not sure if that's the cause for my above stated problem #1 - would be great to figure out how to configure that.

Java Code

public List<Object> getResultSetUsingDriverManager(ChronoLocalDate chronoLocalDate) throws SQLException {
    long start = System.currentTimeMillis();

    Properties properties = new Properties();
    properties.setProperty("user", USERNAME);
    properties.setProperty("password", PASSWORD);
    properties.setProperty("SSL", "true");

    final Connection connection = DriverManager.getConnection(URL, properties);

    log.warn("Presto connection acquired in " + (System.currentTimeMillis() - start) + "ms");

    Statement statement = connection.createStatement();

    ResultSet resultSet = statement.executeQuery(query);

    log.warn("Presto query executed in " + (System.currentTimeMillis() - start) + "ms");

    List<Object> collection = new ArrayList<>();

    int counter = 0;
    long batchStart = System.currentTimeMillis();
    while (resultSet.next()) {
        counter++;
        if (counter % 1000 == 0) {
            log.warn("current count {} and took {}ms", counter, (System.currentTimeMillis() - batchStart));
            batchStart = System.currentTimeMillis();
        }
    }
    log.warn("Results extracted in " + (System.currentTimeMillis() - start));

    return collection;
}

Output

2020-01-08 17:34:31.704  WARN 29368 --- ...       : Presto connection acquired in 0ms
2020-01-08 17:35:16.705  WARN 29368 --- ...       : Presto query executed in 45003ms
2020-01-08 17:37:18.242  WARN 29368 --- ...       : current count 1000 and took 121537ms
2020-01-08 17:37:18.244  WARN 29368 --- ...       : current count 2000 and took 2ms
2020-01-08 17:37:18.245  WARN 29368 --- ...       : current count 3000 and took 1ms
...
2020-01-08 17:37:18.294  WARN 29368 --- ...       : current count 75000 and took 1ms
2020-01-08 17:38:18.857  WARN 29368 --- ...       : current count 76000 and took 60563ms
2020-01-08 17:38:18.858  WARN 29368 --- ...       : current count 77000 and took 1ms
...
2020-01-08 17:38:18.941  WARN 29368 --- ...       : current count 151000 and took 0ms
2020-01-08 17:39:19.241  WARN 29368 --- ...       : current count 152000 and took 60300ms
2020-01-08 17:39:19.242  WARN 29368 --- ...       : current count 153000 and took 1ms
...
2020-01-08 17:39:19.311  WARN 29368 --- ...       : current count 250000 and took 0ms
2020-01-08 17:39:19.311  WARN 29368 --- ...       : Results extracted in 287609

Version Information

  • Java 11
  • com.facebook.presto presto-jdbc 0.230
  • Spring Boot 2.1.6.RELEASE
  • Presto version : 302-e.3 (Starburst version)
Fabian
  • 3,310
  • 3
  • 26
  • 35
  • Your comparison isn't fair, as your code includes object creation from the query results, when showing the raw results this isn't happening. What is `MySpecialObject.RESULT_SET_EXTRACTOR.extractData` doing exactly? Finally you should not use String concat/replace to modify your query. instead use a `PreparedStatement` to set the needed query parameters. – M. Deinum Jan 09 '20 at 06:28
  • We had certain fixes for Java 11 that may affect your case. Can you repeat your experiment using Presto 327 (both server and JDBC)? https://prestosql.io/download.html – Piotr Findeisen Jan 09 '20 at 08:03
  • @M.Deinum the extractor is doing something very lightweight. Removing it entirely has no effect on the behavior. I agree I should not be using String concat on the query, but that also is not part of the issue. I've simplified the code in my example to focus on core problem I'm facing. – Fabian Jan 09 '20 at 18:44
  • @PiotrFindeisen thanks for the heads up. Unfortunately I personally don't have control over the Presto server I'm connecting to but I CAN update the JDBC driver. I'll see what I can do and report back on my findings. – Fabian Jan 09 '20 at 18:46

1 Answers1

2

In recent months we fixed issues related to client connections on Java 11.

Please upgrade your JDBC driver to 327.

Or downgrade to Java 8 on the client side.

Piotr Findeisen
  • 19,480
  • 2
  • 52
  • 82
  • I ended up setting up a quick test application that only contains a connection to presto. I ran the exact same code that runs the above query with Java 8 and then with Java 11. When running with Java 8 the query completed immediately but with Java 11 I saw the same slow performance noted above. Simply updating the JDBC driver did NOT improve performance - downgrading to Java 8 is the only solution I've found. – Fabian Jan 10 '20 at 21:49
  • Thanks for sharing your results. That's very concerning. Do you have the liberty to repeat your test against Presto server 327 and JDBC 327 over Java 8 and 11? (Interesting even if on totally separate env than used for the previous test) – Piotr Findeisen Jan 10 '20 at 21:56
  • I currently don't have the ability to upgrade the Presto server - I'm really only in control of the driver/client. But I'm working to see if a server upgrade is possible. If we determine `302-e.3` isn't compatible with Java 11 JDBC clients at all, that would make the version upgrade option more of a necessity. Also thanks for being so responsive Piotr, it's very appreciated. – Fabian Jan 10 '20 at 22:05
  • 302 release is pretty old already, so you will easily find other reasons to upgrade. We can talk more on Presto community slack https://prestosql.io/slack.html. See you there! – Piotr Findeisen Jan 10 '20 at 22:12