I had this piece of code:
while((line=br.readLine())!=null)
{
String Words[]= line.split(" ");
outputLine = SomeAlgorithm(Words);
output.write(outputLine);
}
As you can see in the above code, for every line in the input file I'm reading one line, running some algorithm on it which modifies that line read basically, and then writes the output line to some file.
There are 9k lines in the file, and the entire program took 3 minutes on my machine.
I thought, okay, I'm doing 2 I/Os for every (line) run of the algorithm. So I'm doing around 18k I/Os. Why not collect all the lines first into an ArrayList
, then loop through the list and run the algorithm on each line? Also collect each output into one string variable, and then write out all the output once at the end of the program.
That way, I'd have total 2 big I/Os for the entire program (18k small File I/Os to 2 big File I/Os). I thought this would be faster, so I wrote this:
List<String> lines = new ArrayList<String>();
while((line=br.readLine())!=null)
{
lines.add(line); // collect all lines first
}
for (String line : lines){
String Words[] = line.split(" ");
bigOutput+=SomeAlgorithm(Words); // collect all output
}
output.write(bigOutput);
But, this thing took 7 minutes !!!
So, why is looping through ArrayList slower than reading a file line by line?
Note : Collecting all lines by readLine() and writing the bigOutput are each taking only a few seconds. There is no change made to SomeAlgorithm() either. So, definitely, I think the culprit is for (String line: lines)
Update: As mentioned in the various comments below, the problem was not with ArrayList traversal , it was with the way the output was accumulated using += . Shifting to StringBuilder() did give a faster-than-original result.