2

I have a method that writes 44 MB worth of data from a ResultSet to a CSV file. However, it is taking about 3.5 minutes to complete. This seems slow for only 44 MB of data. Can anyone see anything slowing down my code?:

public static void convertToCSV(final ResultSet rs) throws SQLException, IOException {
    final BufferedWriter fw = new BufferedWriter(new FileWriter(new File("alert.csv")));
    while (rs.next()) {
        fw.write(rs.getString("FIELD1")+",");
        fw.write(rs.getString("FIELD2")+",");
        fw.write(rs.getString("FIELD3")+",");
        final String clobValue = rs.getString("FIELD4");
        if(clobValue==null)
            fw.write("null,");
        else{
            fw.write("\""+clobValue+"\",");
        }
        final Date date = new Date(rs.getLong("FIELD5"));
        final DateFormat format = new SimpleDateFormat("yyyyMMdd HH:mm:ss");
        format.setTimeZone(TimeZone.getTimeZone("Etc/UTC"));
        final String dateTime[] = format.format(date).split(" ");
        fw.write(dateTime[0]+",");
        fw.write(dateTime[1]);

        fw.write("\n");
    }
    fw.close();
}
John Roberts
  • 5,885
  • 21
  • 70
  • 124
  • 1
    Increasing the fetch size on the Statement you got the ResultSet from may help to an extent - by default it goes back to the database to fetch more data every 10 rows. The trade-off is that if you increase the fetch size you also increase the memory footprint. – JonK Oct 20 '15 at 14:00
  • 1
    Is it really slow, there is a lot of JDBC code involved, you are doing a lot of String concats generating a lot of strings which need to be gc'd. Although the date format isn't thread safe I would only construct it once inside this method and reuse. And probably use a `StringBuilder` to create a string and write it in one go. – M. Deinum Oct 20 '15 at 14:02
  • Have you measured the time if you don't write any data, and move the SimpleDateFormat out of the loop? – Roger Lindsjö Oct 20 '15 at 14:03
  • You can also measure how much time it takes to write dummy data, without having to fetch it from a DB. – Andrew Henle Oct 20 '15 at 14:06

4 Answers4

3

This may depend on the JDBC driver, your fetch strategy, the disk...

But you can certainly avoid creating so much temporary objects:

  1. Move the DateFormat out of the loop.
  2. Avoid creating temporary strings rs.getString("FIELD1")+"," and instead do two write calls on the writer.
  3. Avoid the split operation.
wero
  • 32,544
  • 3
  • 59
  • 84
3

Something like this could be faster

public static void convertToCSV(final ResultSet rs) throws SQLException, IOException {
    final BufferedWriter fw = new BufferedWriter(new FileWriter(new File("alert.csv")));
    final DateFormat format = new SimpleDateFormat("'yyyyMMdd','HH:mm:ss'");
    format.setTimeZone(TimeZone.getTimeZone("Etc/UTC"));

    while (rs.next()) {
      StringBuilder sb = new StringBuilder();
      sb.append(rs.getString("FIELD1")).append(',')
        .append(rs.getString("FIELD2")).append(',')
        .append(rs.getString("FIELD3")).append(',');

        final String clobValue = rs.getString("FIELD4");
        if(clobValue==null)
          sb.append("null,");            
        else{
          sb.append('\"').append(clobValue).append('\"').append(',');
        }
        Date date = new Date(rs.getLong("FIELD5"));
        sb.append(format.format(date)).append('\n');
        fw.write(sb.toString());
    }
    fw.close();
}
  1. The creation of the DateFormat is only done once, outside the loop.
  2. Use a StringBuilder
  3. Avoid temp object creation as much as possible.

You could even reuse the StringBuilder, by moving the creation out of the loop and after writing it do a setLength(0).

M. Deinum
  • 115,695
  • 22
  • 220
  • 224
2

The following should be a bit faster.

fw.write(rs.getString("FIELD1"));
fw.write(',');

Also declare and prepare the SimpleDateFormat before the loop.

final DateFormat format = new SimpleDateFormat("yyyyMMdd','HH:mm:ss");

(Single quotes not needed here.) without split which is expensive.

Joop Eggen
  • 107,315
  • 7
  • 83
  • 138
1

You can increase the fetch size on the Statement that produced your ResultSet - this will reduce the amount of times that it needs to go back to the database to fetch the next batch of rows (it is set to 10 by default). The downside of this is that it will increase the memory footprint of the ResultSet because it will be holding more data in memory.

Your SimpleDatFormat object is created with the same data on each iteration of the loop - if you move it outside the loop you're only instantiating it once. You can also move the setTimeZone(...) statement outside the loop too.

String concatenation using variable values internally creates a new StringBuilder object each time you do it - you can make further gains by ensuring that you're never doing concatenation within your loop. You can either do this by creating your own StringBuilder manually, putting the whole row's worth of data into it, and then writing the row to the BufferedWriter once (at the probable expense of more memory) or you can put in more write(...) calls to avoid doing the concatenation.

JonK
  • 2,097
  • 2
  • 25
  • 36