7

I'm tryin to write a large ResulSet (~1mm rows) to a single file. Is there a preferred/efficient way to do this in Java 1.6?

Madhu
  • 73
  • 1
  • 1
  • 3

2 Answers2

14

That depends on the JDBC driver used. You need to instruct the JDBC driver to not load the entire ResultSet into Java's memory beforehand, but instead load it on a per-row basis on every next() call. Then, inside the ResultSet#next() loop, you need to write the data immediately to the file instead of holding it in List or something.

It's unclear what JDBC driver you're using, but for example the MySQL JDBC driver could be instructed to serve the resultset on a per-row basis the following way as per the MySQL JDBC driver documentation:

ResultSet

By default, ResultSets are completely retrieved and stored in memory. In most cases this is the most efficient way to operate, and due to the design of the MySQL network protocol is easier to implement. If you are working with ResultSets that have a large number of rows or large values, and can not allocate heap space in your JVM for the memory required, you can tell the driver to stream the results back one row at a time.

To enable this functionality, you need to create a Statement instance in the following manner:

 stmt = connection.createStatement(ResultSet.TYPE_FORWARD_ONLY, ResultSet.CONCUR_READ_ONLY);
 stmt.setFetchSize(Integer.MIN_VALUE);

Here's a concrete kickoff example:

try (
    PrintWriter writer = new PrintWriter(new OutputStreamWriter(new BufferedOutputStream(new FileOutputStream("/records.txt")), "UTF-8"));
    Connection connection = dataSource.getConnection();
    Statement statement = connection.createStatement(ResultSet.TYPE_FORWARD_ONLY, ResultSet.CONCUR_READ_ONLY);
) {
    statement.setFetchSize(Integer.MIN_VALUE);

    try (ResultSet resultSet = statement.executeQuery("SELECT col1, col2, col3 FROM sometable")) {
        while (resultSet.next()) {
            writer.append(resultSet.getString("col1")).append(",")
                  .append(resultSet.getString("col2")).append(",")
                  .append(resultSet.getString("col3")).println();
        }
    }
}

By the way, I'd first check if the DB doesn't have builtin SQL support for this which can do this much more efficiently. For example, MySQL has a SELECT INTO OUTFILE construct for this.

The SELECT ... INTO OUTFILE 'file_name' form of SELECT writes the selected rows to a file. The file is created on the server host, so you must have the FILE privilege to use this syntax. file_name cannot be an existing file, which among other things prevents files such as /etc/passwd and database tables from being destroyed. As of MySQL 5.1.6, the character_set_filesystem system variable controls the interpretation of the file name.

Community
  • 1
  • 1
BalusC
  • 1,082,665
  • 372
  • 3,610
  • 3,555
  • @Nathan: that's JDBC driver dependent. The API doc says literally *"Gives the JDBC driver a hint ..."* note, **a hint**. The right value is mentioned in the MySQL documentation which I cited in my answer. – BalusC Aug 25 '11 at 18:32
  • 1
    yes, definitely your answer is right here. i was just wondering why setFetchSize(1) wasn't a good enough hint. maybe i can come up with another question along those lines. – Nathan Hughes Aug 25 '11 at 18:36
  • @Nathan: again, that's JDBC driver dependent. There's nothing in JDBC API which specifies valid/allowed values. You need to consult the JDBC driver implementation specific documentation. – BalusC Aug 25 '11 at 18:40
  • @balusc Thanks for your input. I'm using the Oracle 11g driver. I've tried the setFetchSize() w/ limited performance improvement. I suppoose the issue is two pronged - one to load the large ResultSet and the other to write to the file. On my desktop (3GHz Core 2 Duo, 4 GB RAM@800MHz bus & a 7200RPM HDD), using a FileOutputStream (w/ or w/o a setFetchSize()) without buffering, this takes about 40 minutes to write. I was wondering if you have any examples of using the NIO package which may help in writing to disk faster due to the DMA access via the JVM. – Madhu Aug 25 '11 at 19:05
  • Btw, the resultant file is about 150MB in size – Madhu Aug 25 '11 at 19:05
  • As explicitly stated in my answer, it depends on the JDBC driver used. My answer was targeted on MySQL as you didn't mention anything about your DB and MySQL is one of few which loads anything in memory. You need to consult the [Oracle JDBC driver documentation](http://download.oracle.com/docs/cd/B28359_01/java.111/b31224/resltset.htm). According to it, it already fetches only 10 rows for each `next()` by default. As to NIO, `FileOutputStream` already uses NIO under the covers when available. You only need to ensure that you use `BufferedOutputStream` as well to have a decent in-memory buffer. – BalusC Aug 25 '11 at 19:17
  • 1
    Related: http://webmoli.com/2009/02/01/jdbc-performance-tuning-with-optimal-fetch-size/ You might want to increase the Oracle default fetch size to e.g. 100 (if memory allows that). And you should really buffer the `FileOutputStream` by `BufferedOutputStream` as shown in my code example. It'll make a huge difference. I've looked around if Oracle doesn't support something like `SELECT INTO OUTFILE`, but it seems that you've to write a proc for that or something. I'm not into Oracle DB, so I can't go in detail, sorry. – BalusC Aug 25 '11 at 19:31
0

From GitHub: https://github.com/OhadR/ohadr.common/blob/master/src/main/java/com/ohadr/common/utils/resultset/ResultSetConverters.java

public static void writeResultSetToWriter(ResultSet resultSet, PrintWriter writer) throws SQLException
{
ResultSetMetaData metadata = resultSet.getMetaData();
int numColumns = metadata.getColumnCount();
int numRows = 0;

while(resultSet.next())             //iterate rows
{
    ++numRows;
    JSONObject obj = new JSONObject();      //extends HashMap
    for (int i = 1; i <= numColumns; ++i)           //iterate columns
    {
        String column_name = metadata.getColumnName(i);
        obj.put(column_name, resultSet.getObject(column_name));
    }
    writer.println(obj.toJSONString());

    if(numRows % 1000 == 0)
        writer.flush();
}
OhadR
  • 8,276
  • 3
  • 47
  • 53