4

Thanks for everyone ^_^,the problem is solved:there is a single line is too big(over 400M...I download a damaged file while I didn't realize), so throw a OutOfMemoryError

I want to split a file by using java,but it always throw OutOfMemoryError: Java heap space,I searched on the whole Internet,but it looks like no help :(

ps. the file's size is 600M,and it have over 30,000,000 lines,every line is no longer than 100 chars. (maybe you can generate a "level file" like this:{ id:0000000001,level:1 id:0000000002,level:2 ....(over 30 millions) })

pss. set the Jvm memory size larger is not work,:(

psss. I changed to another PC, problem remains/(ㄒoㄒ)/~~

no matter how large the -Xms or -Xmx I set,the outputFile's size is always same,(and the Runtime.getRuntime().totalMemory() is truely changed)

here's the stack trace:

 Heap Size = 2058027008
    Exception in thread "main" java.lang.OutOfMemoryError: Java heap space
        at java.util.Arrays.copyOf(Arrays.java:2882)
        at java.lang.AbstractStringBuilder.expandCapacity(AbstractStringBuilder.java:100)
        at java.lang.AbstractStringBuilder.append(AbstractStringBuilder.java:515)
        at java.lang.StringBuffer.append(StringBuffer.java:306)
        at java.io.BufferedReader.readLine(BufferedReader.java:345)
        at java.io.BufferedReader.readLine(BufferedReader.java:362)
        at com.xiaomi.vip.tools.ptupdate.updator.Spilt.main(Spilt.java:39)
    ...

here's my code:

package com.updator;

import java.io.BufferedReader;
import java.io.File;
import java.io.FileInputStream;
import java.io.FileOutputStream;
import java.io.FileReader;

public class Spilt {
    public static void main(String[] args) throws Exception {
        long heapSize = Runtime.getRuntime().totalMemory();

        // Print the jvm heap size.
        System.out.println("Heap Size = " + heapSize);

        String mainPath = "/home/work/bingo/";
        File mainFilePath = new File(mainPath);
        FileInputStream inputStream = null;
        FileOutputStream outputStream = null;
        try {
            if (!mainFilePath.exists())
                mainFilePath.mkdir();

            String sourcePath = "/home/work/bingo/level.txt";
            inputStream = new FileInputStream(sourcePath);
            BufferedReader bufferedReader = new BufferedReader(new FileReader(
                    new File(sourcePath)));

            String savePath = mainPath + "tmp/";
            Integer i = 0;
            File file = new File(savePath + "part"
                    + String.format("%0" + 5 + "d", i) + ".txt");
            if (!file.getParentFile().exists())
                file.getParentFile().mkdir();
            file.createNewFile();
            outputStream = new FileOutputStream(file);
            int count = 0, total = 0;
            String line = null;
            while ((line = bufferedReader.readLine()) != null) {
                line += '\n';
                outputStream.write(line.getBytes("UTF-8"));
                count++;
                total++;
                if (count > 4000000) {
                    outputStream.flush();
                    outputStream.close();
                    System.gc();
                    count = 0;
                    i++;
                    file = new File(savePath + "part"
                            + String.format("%0" + 5 + "d", i) + ".txt");
                    file.createNewFile();
                    outputStream = new FileOutputStream(file);
                }
            }

            outputStream.close();
            file = new File(mainFilePath + "_SUCCESS");
            file.createNewFile();
            outputStream = new FileOutputStream(file);
            outputStream.write(i.toString().getBytes("UTF-8"));
        } finally {
            if (inputStream != null)
                inputStream.close();
            if (outputStream != null)
                outputStream.close();
        }
    }
}

I think maybe: when outputStream.close(),the memory did not release?

ACBingo
  • 1,078
  • 1
  • 8
  • 13
  • Show the exception and stack trace – AhmadWabbi Jan 04 '17 at 08:04
  • Check out the following link which has been answered by others - http://stackoverflow.com/questions/11578123/how-to-increase-java-heap-memory-permanently – FlameHaze Jan 04 '17 at 08:15
  • 1
    Why do you use a `Scanner`? You don't need the functionality, a `BufferedReader` would be enough and much less resource-hungry. – piet.t Jan 04 '17 at 08:25
  • @FlameHaze, I tried, but no matter how large the -Xms or -Xmx I set,the outputFile's size is always same,(the Runtime.getRuntime().totalMemory() is truely changed) – ACBingo Jan 04 '17 at 08:28
  • Possible duplicate of [Out of memory using Scanner to read large file into memory](http://stackoverflow.com/questions/18689264/out-of-memory-using-scanner-to-read-large-file-into-memory) – AxelH Jan 04 '17 at 08:30
  • @piet.t I dont think the problem is Scanner (I have changed to BufferedReader ,the problem remains :( ) – ACBingo Jan 04 '17 at 08:43
  • @ACBingo While it might not be the root-cause of the problem it is still not optimal - switching to a `BufferedReader` reduced the execution-time by half in my test. As for the problem: how long are the lines in your input-file? – piet.t Jan 04 '17 at 08:57
  • ...and your stacktrace does not seem to match the code you posted - it complains about `StringBuilder.append` but I don't see this in the code you posted, so either the stacktrace is incomplete or your code is! – piet.t Jan 04 '17 at 09:01
  • I think the problem is not in the size of the file, but rather in the size of "a line" in the file. There is at least one very big line in the file that the VM could not handle in `line.getBytes` call – AhmadWabbi Jan 04 '17 at 09:07
  • actually,the line no longer than 100 chars,I think when outputStream.close(),the outputStream's memory is not clear... – ACBingo Jan 04 '17 at 09:27
  • @ACBingo Okay then.. "outputStream = null;" :-) – Smith Lee Jan 04 '17 at 09:46
  • @AhmadWabbi ,Thank you very very much,I think I know where the problem is,maybe you are right...wait for a moment,then i will post the result – ACBingo Jan 04 '17 at 10:17
  • 2
    Well the stack is pretty clear : bufferedReader.readLine throws outOfMemory. The most straightforward cause to look for is : there is a single line does not fit into memory. (And you could System.out.println a line count to see which one). – GPI Jan 04 '17 at 10:20
  • Stop reading by lines. Use a char[]. Since you are using the buffered reader it should be fine. Also, you do not need to call `System.gc`. – matt Jan 04 '17 at 11:09
  • @ACBingo Glad I was of help. I should have added an answer :) – AhmadWabbi Jan 04 '17 at 15:29

2 Answers2

3

So you open the original file and create a BufferedReaderand a counter for the lines.

char[] buffer = new char[5120];
BufferedReader reader = Files.newBufferedReader(Paths.get(sourcePath), StandardCharsets.UTF_8);
int lineCount = 0;

Now you read into your buffer, and write the characters as they come in.

int read;

BufferedWriter writer = Files.newBufferedWriter(Paths.get(fileName), StandardCharsets.UTF_8);
while((read = reader.read(buffer, 0, 5120))>0){
    int offset = 0;
    for(int i = 0; i<read; i++){
        char c = buffer[i];
        if(c=='\n'){
           lineCount++;
           if(lineCount==maxLineCount){
              //write the range from 0 to i to your old writer.
              writer.write(buffer, offset, i-offset);
              writer.close();
              offset=i;
              lineCount=0;
              writer = Files.newBufferedWriter(Paths.get(newName), StandarCharset.UTF_8);
           }
        }
        writer.write(buffer, offset, read-offset);
    }
    writer.close();
}

That should keep the memory usage lower and prevent you from reading too large of a line at once. You could go without BufferedWriters and control the memory even more, but I don't think that is necessary.

matt
  • 10,892
  • 3
  • 22
  • 34
  • why 5120?and why read 5120 at once...? I mean if one line just only 100 long,should not it be worse? – ACBingo Jan 06 '17 at 06:18
  • 5120 is a buffer size, and I just picked it arbitrarily. Since a buffered reader is being used, it doesn't matter, it would even work fine to just read one character at a time. Why do you think it will perform worse for a line that is 100 long? – matt Jan 06 '17 at 06:59
1

I've tested with large text file.(250Mb)

it works well.

You need to add try catch exception codes for file stream.

public class MyTest {
    public static void main(String[] args) {
        String mainPath = "/home/work/bingo/";
        File mainFilePath = new File(mainPath);
        FileInputStream inputStream = null;
        FileOutputStream outputStream = null;
        try {
            if (!mainFilePath.exists())
                mainFilePath.mkdir();

            String sourcePath = "/home/work/bingo/level.txt";
            inputStream = new FileInputStream(sourcePath);
            Scanner scanner = new Scanner(inputStream, "UTF-8");

            String savePath = mainPath + "tmp/";
            Integer i = 0;
            File file = new File(savePath + "part" + String.format("%0" + 5 + "d", i) + ".txt");
            if (!file.getParentFile().exists())
                file.getParentFile().mkdir();
            file.createNewFile();
            outputStream = new FileOutputStream(file);
            int count = 0, total = 0;

            while (scanner.hasNextLine()) {
                String line = scanner.nextLine() + "\n";
                outputStream.write(line.getBytes("UTF-8"));
                count++;
                total++;
                if (count > 4000000) {
                    outputStream.flush();
                    outputStream.close();
                    count = 0;
                    i++;
                    file = new File(savePath + "part" + String.format("%0" + 5 + "d", i) + ".txt");
                    file.createNewFile();
                    outputStream = new FileOutputStream(file);
                }
            }

            outputStream.close();
            file = new File(mainFilePath + "_SUCCESS");
            file.createNewFile();
            outputStream = new FileOutputStream(file);
            outputStream.write(i.toString().getBytes("UTF-8"));
        } catch (FileNotFoundException e) {
            System.out.println("ERROR: FileNotFoundException :: " + e.getStackTrace());
        } catch (IOException e) {
            System.out.println("ERROR: IOException :: " + e.getStackTrace());
        } finally {
            if (inputStream != null)
                try {
                    inputStream.close();
                    if (outputStream != null)
                        outputStream.close();

                } catch (IOException e) {
                    e.printStackTrace();
                }
        }
    }
}

if the problem still occurs, change java heap memory size with following command on the shell prompt.

ex) Xmx1g : 1Gb heap memory size, MyTest : class name

java -Xmx1g MyTest

Smith Lee
  • 340
  • 2
  • 9
  • I've tried again with 2Gb text file. but, there are no issues. My system environment : Intel i5 / Java 1.7 / 6Gb memory. – Smith Lee Jan 04 '17 at 08:49
  • if your system has very small size memory, decrease line count number. for example. 4000000 to 400 – Smith Lee Jan 04 '17 at 08:52
  • I tried, problem remains :( , ps. My environment:i7/Java1.6/16GB memory,and the outputFiles total size is same,too – ACBingo Jan 04 '17 at 08:58
  • ,Thank you very much...but change the heap memory is not work.... /(ㄒoㄒ)/~~ – ACBingo Jan 04 '17 at 09:29