2

I've found a bottleneck in my app that keeps growing as data in my files grow (see attached screenshot of VisualVM below).

Below is the getFileContentsAsList code. How can this be made better performance-wise? I've read several posts on efficient File I/O and some have suggested Scanner as a way to efficiently read from a file. I've also tried Apache Commons readFileToString but that's not running fast as well.

The data file that's causing the app to run slower is 8 KB...that doesn't seem too big to me.

I could convert to an embedded database like Apache Derby if that seems like a better route. Ultimately looking for what will help the application run faster (It's a Java 1.7 Swing app BTW).

Here's the code for getFileContentsAsList:

public static List<String> getFileContentsAsList(String filePath) throws IOException {
    if (ReceiptPrinterStringUtils.isNullOrEmpty(filePath)) throw new IllegalArgumentException("File path must not be null or empty");

    Scanner s = null;
    List<String> records = new ArrayList<String>();

    try {
        s = new Scanner(new BufferedReader(new FileReader(filePath)));
        s.useDelimiter(FileDelimiters.RECORD);

        while (s.hasNext()) {
           records.add(s.next());
        }
    } finally {
        if (s != null) {
            s.close();
        }
    }

    return records;
}

Application CPU Hot Spots

Zack Macomber
  • 6,682
  • 14
  • 57
  • 104
  • What is the intent of your app. You may simply discard stop word like in data mining if you don't require them. – Aniket Thakur Sep 06 '13 at 12:45
  • The application is called ReceiptPrinter. It gives users the ability to add addresses, designations and receipts and then print off those receipts on regular 8 1/2 x 11 paper. It's mainly for acknowledging somebody sent a gift for a charity...it's not like a receipt you would get at Walmart – Zack Macomber Sep 06 '13 at 12:48
  • What does "discard stop word" mean? – Zack Macomber Sep 06 '13 at 12:49
  • You may try removing the BufferedReader as it adds an unnecessary copy operation. You may give the ArrayList a higher initial capacity. Further you should profile again with the filtering of JDK classes disabled to see which method called by your method spends most of the time. – Holger Sep 06 '13 at 12:54

3 Answers3

1

The size of an ArrayList is multiplied by 1.5 when necessary. This is O(log(N)). (Doubling was used in Vector.) I would certainly use an O(1) LinkedList here, and BufferedReader.readLine() rather than a Scanner if I was trying to speed it up. It's hard to believe that the time to read one 8k file is seriously a concern. You can read millions of lines in a second.

user207421
  • 305,947
  • 44
  • 307
  • 483
  • What do you think about the screenshot from VisualVM? It looks like to me that it's showing the file I/O as the bottleneck. I may be looking in the wrong place, though... – Zack Macomber Sep 06 '13 at 16:22
  • It does, but you're invoking it 18,425 times. Why? You said you only had to read an 8k file. It looks more like you have to read 18,425 8k files. Sounds like you need a database to me. And the total elapsed time is about three seconds. Is this really a major problem? – user207421 Sep 06 '13 at 22:34
  • That's where I'm confused with what VisualVM is relaying...can't see how this method is getting invoked 18,425 times...either way, I'm thinking of putting the contents into a private static Map variable in the class that has `getFileContents` to reduce the need to read from the disk. I'm trying to avoid creating a database as my code just needs to rely on the Java SE runtime right now. – Zack Macomber Sep 06 '13 at 22:43
  • I suggest you log the stack trace when you enter the method. You are calling it 18,425 times. This is probably the only problem you have. – user207421 Sep 07 '13 at 02:55
  • would you mind checking out my answer below? It's 11:15pm here in the Eastern U.S. an I've got to get some rest. Let me know what you think...I probably should have designed this app with a DB backend at the outset...I may do some refactoring in the future to plug one in. – Zack Macomber Sep 07 '13 at 03:17
1

So, file.io gets to be REAL expensive if you do it a lot...as seen in my screen shot and original code, getFileContentsAsList, which contains file.io calls, gets invoked quite a bit (18.425 times). VisualVM is a real gem of a tool to point out bottlenecks like these!

After contemplating over various ways to improve performance, it dawned on me that possibly the best way is to do file.io calls as little as possible. So, I decided to use private static variables to hold the file contents and to only do file.io in the static initializer and when a file is written to. As my application is (fortunately) not doing excessive writing (but excessive reading), this makes for a much better performing application.

Here's the source for the entire class that contains the getFileContentsAsList method. I took a snapshot of that method and it now runs in 57.2 ms (down from 3116 ms). Also, it was my longest running method and is now my 4th longest running method. The top 5 longest running methods run for a total of 498.8 ms now as opposed to the ones in the original screenshot that ran for a total of 3812.9 ms. That's a percentage decrease of about 85% [100 * (498.8 - 3812.9) / 3812.9].

package com.mbc.receiptprinter.util;

import java.io.File;
import java.io.IOException;
import java.util.ArrayList;
import java.util.Arrays;
import java.util.HashMap;
import java.util.List;
import java.util.Map;
import java.util.logging.Level;

import org.apache.commons.io.FileUtils;

import com.mbc.receiptprinter.constant.FileDelimiters;
import com.mbc.receiptprinter.constant.FilePaths;

/*
 * Various File utility functions.  This class uses the Apache Commons FileUtils class.
 */
public class ReceiptPrinterFileUtils {

    private static Map<String, String> fileContents = new HashMap<String, String>();

    private static Map<String, Boolean> fileHasBeenUpdated = new HashMap<String, Boolean>();

    static {
        for (FilePaths fp : FilePaths.values()) {
            File f = new File(fp.getPath());
            try {
                FileUtils.touch(f);
                fileHasBeenUpdated.put(fp.getPath(), false);
                fileContents.put(fp.getPath(), FileUtils.readFileToString(f));
            } catch (IOException e) {
                ReceiptPrinterLogger.logMessage(ReceiptPrinterFileUtils.class, 
                                                Level.SEVERE, 
                                                "IOException while performing FileUtils.touch in static block of ReceiptPrinterFileUtils", e);
            }
        }
    }

    public static String getFileContents(String filePath) throws IOException {
        if (ReceiptPrinterStringUtils.isNullOrEmpty(filePath)) throw new IllegalArgumentException("File path must not be null or empty");
        File f = new File(filePath);
        if (fileHasBeenUpdated.get(filePath)) {
            fileContents.put(filePath, FileUtils.readFileToString(f));
            fileHasBeenUpdated.put(filePath, false);
        }
        return fileContents.get(filePath);
    }

    public static List<String> convertFileContentsToList(String fileContents) {
        List<String> records = new ArrayList<String>();
        if (fileContents.contains(FileDelimiters.RECORD)) {
            records = Arrays.asList(fileContents.split(FileDelimiters.RECORD));
        }
        return records;
    }

    public static void writeStringToFile(String filePath, String data) throws IOException {
        fileHasBeenUpdated.put(filePath, true);
        FileUtils.writeStringToFile(new File(filePath), data);
    }

    public static void writeStringToFile(String filePath, String data, boolean append) throws IOException {
        fileHasBeenUpdated.put(filePath, true);
        FileUtils.writeStringToFile(new File(filePath), data, append);
    }
}
Zack Macomber
  • 6,682
  • 14
  • 57
  • 104
0

ArrayLists have a good performance at reading and also on writing IF the lenth does not change very often. In your application the length changes very often (size is doubled, when it is full and an element is added) and your application needs to copy your array into an new, longer array.

You could use a LinkedList, where new elements are appended and no copy actions are needed. List<String> records = new LinkedList<String>();

Or you could initialize the ArrayList with the approximated finished Number of Words. This will reduce the number of copy actions. List<String> records = new ArrayList<String>(2000);

Simulant
  • 19,190
  • 8
  • 63
  • 98
  • "All the time"? With the default implementation the capacity is doubled when the actual number of elements reaches it. – Hauke Ingmar Schmidt Sep 06 '13 at 13:24
  • The growth is _expontentially_. It happens _not_ very often (starting at default size 10 it would need 11 doublings to store every single byte of the file as entry) and it is performed by a fast, low level native memory copy. Unless proven by measurement I doubt this is a bottleneck here. – Hauke Ingmar Schmidt Sep 06 '13 at 13:56