3

I am working on a big project where I have more than 1 million lines of data. Data is divided into various files containing 20,000 lines each. Now the data from each file is read line by line and some variable x is concatenated to each line. I am storing these concatenated string to an array list. Then this array list is saved to output files line by line.

This is taking 3-4 minutes on each file. Is there anyway to write the entire ArrayList to a file in one go, so that it won't take that much time. Or is there any faster way to do this?

Here is some sample code:

    List<String> outputData = new ArrayList<String>(); 
//Output arraylist containing concatenated data

writeLines(File outputFile,outputData); //The data is written to file

What would be the fastest way to achieve this task?

Sid
  • 4,893
  • 14
  • 55
  • 110
Ahmar Ali
  • 1,038
  • 7
  • 27
  • 52
  • I will use a loop through the whole lisp to append it to file – Rugal Jan 10 '14 at 09:46
  • How are you writing to the file *now*? – Moritz Petersen Jan 10 '14 at 09:47
  • How does your writeLines method looks like? – Jiri Kusa Jan 10 '14 at 09:48
  • @MoritzPetersen writeLines does the writing part. – Ahmar Ali Jan 10 '14 at 09:48
  • @JiriKusa it is built in function of commons io which I think uses loop – Ahmar Ali Jan 10 '14 at 09:49
  • You are missing the details how you actually write your file. Using a BufferedOutputStream | BufferedWrite or not can make a huge difference. – Gyro Gearless Jan 10 '14 at 09:50
  • @GyroGearless Output is done through writeLines function of commons io – Ahmar Ali Jan 10 '14 at 09:51
  • Are you sure, writing to disk is the main contributor to bad performance? How large are the files you are generating (in MB)? – Moritz Petersen Jan 10 '14 at 09:57
  • I'm not really sure what you're doing. If you're not doing any manipulation, then writing between streams is as @GyroGearless suggests. If what you are doing by "is assigned to some variable x" you're basically reading the entire 20.000 lines into a single String, and adding that to an array, there are a number of ways you could probably do this more elegantly (avoid String interning for example, by using CharBuffer or ByteBuffer instead). – Mikkel Løkke Jan 10 '14 at 10:18

3 Answers3

5

Once you have the ArrayList ready you can use the writeLines method from FileUtils to write the entire ArrayList in one go.

Have a look at the documentation here and the various writeLines methods that are available.

JHS
  • 7,761
  • 2
  • 29
  • 53
0

A proper solution could be to skip the ArrayList and write directly to file. But you should consider, that disk IO is way slower than RAM.

Testing like this:

    Collection<String> list = new ArrayList<String>();
    for (int i = 0; i < 1000000; i++) {
        // just fill something in:
        list.add("A " + i + " " + new Date() + "!");
    }
    long start = System.nanoTime();
    PrintWriter out = new PrintWriter("example.out");
    for (String line : list) {
        out.println(line);
    }
    out.close();
    long end = System.nanoTime();
    System.out.println((end - start) / 1000000000D + " sec");

Prints on my old Dell laptop:

0.508509454 sec
Moritz Petersen
  • 12,902
  • 3
  • 38
  • 45
0

First I was using writeStringtoFile to write individual lines to file which took ages. Seems like first saving all lines in array list and writing whole list with writeLines function solved the problem. Now it only takes second.

Thanks everyone for helping

Ahmar

Ahmar Ali
  • 1,038
  • 7
  • 27
  • 52