3

Firstly, I'm sorry about my English.

I looking for an effective a way to read a Big file in java. I make a log analysis program and I have log files at least from 500 MB to 4 GB. I have tried the Filechannel class (Memory Mapped files), but I could not get effective result. Take a look here: http://www.linuxtopia.org/online_books/programming_books/thinking_in_java/TIJ314_029.htm

My purpose is read the data in the buffer, and then using regular expression.

DumpFilePath file size is about 4 GB.

public static List<String> anaysis_main(String pattern_string) throws IOException {

    List<String> result = new ArrayList<String>();
    Pattern pattern = Pattern.compile(pattern_string, Pattern.CASE_INSENSITIVE);


    File file = new File(DumpFilePath);

    RandomAccessFile raf = new RandomAccessFile(file,"rw");
    String line = null;
    raf.seek(0);


    int i = 0;

    while((line=raf.readLine())!=null)
    {
        Matcher matcher = pattern.matcher(line);
        while (matcher.find())
        {               
            result.add(matcher.group(1));
        }
    }
    raf.close();

    return result;
}

Any ideas?

kamaci
  • 72,915
  • 69
  • 228
  • 366
Hyunwoo Kim
  • 105
  • 1
  • 1
  • 5
  • 2
    `FileChannel` _is_ the way to go; however, mapping is limited in size to something like 2 GB, so you'll have to do "sliding windows" for reading. – fge Jun 15 '13 at 10:26
  • 4
    Not sure why this has been downvoted... Are people sneering because "4Gb is tiny compared to what *I* work on!"? Or because the English is a bit creaky (which s/he apologises for)? Grow up. The question is perfectly clear: "How can I apply a regex to a 4Gb file?" – j_random_hacker Jun 15 '13 at 10:31
  • Here is one suggestion: is the file divided into smaller units (e.g. lines) in such a way that the regexes you want to apply never span multiple units? If so then just read a unit at a time into memory, and apply your regexes to each. – j_random_hacker Jun 15 '13 at 10:33
  • 1
    Since you say your data is line oriented, why don't you use a plain `BufferedReader`? – fge Jun 15 '13 at 10:45
  • @j_random_hacker RandomAccessFile raf = new RandomAccessFile(file,"rw"); String line = null; raf.seek(0); String data = ""; int i = 0; while((line=raf.readLine())!=null) { i++; System.out.println(i); //Matcher matcher = pattern.matcher(line); //while (matcher.find()) //{ // result.add(matcher.group(1)); //} } bar_load.setValue(100); raf.close(); frame_load.setVisible(false); return result; this is my source code but Still takes a long time.... Is this you say to me right? – Hyunwoo Kim Jun 15 '13 at 10:56
  • Paste your code in your question, please – fge Jun 15 '13 at 11:01
  • @fge sorry!! i have uploaded source code – Hyunwoo Kim Jun 15 '13 at 11:16
  • result is where to store the result of regular experssion . – Hyunwoo Kim Jun 15 '13 at 11:25
  • 2
    Why is it not effective (in terms of memory usage of CPU usage for instance)? What problems are you having? – Patrick Jun 15 '13 at 12:08
  • @HyunwooKim: Yes, the code you included in your question is more or less what I intended. What is "a long time"? Bear in mind it takes a while to do anything with a 4Gb file! If it takes less than twice the time needed to copy the file, then I think it's doing pretty well. Otherwise, it might be that `result` is getting very big, meaning that a lot of time will be spent allocating memory. – j_random_hacker Jun 16 '13 at 12:39

1 Answers1

2

Can you use a Buffered reader? More can be read on the buffered reader here.

The code would look something like this:

File file = new File(DumpFilePath);

//Open the file for reading
try {
BufferedReader br = new BufferedReader(new FileReader(file));
while ((thisLine = br.readLine()) != null) { 

    // Your line by line parsing payload here

    Matcher matcher = pattern.matcher(thisLine);
    while (matcher.find())
    {               
        result.add(matcher.group(1));
        }

} // end while 
} // end try
catch (IOException e) {
System.err.println("Error: " + e);
}
Ro Yo Mi
  • 14,790
  • 5
  • 35
  • 43