3

I want to find out what method is better of two that I have come up with for concatenating my text files in Java. If someone has some insight they can share about what goes on at the kernel level that explains the difference between these methods of writing to a FileChannel, I would greatly appreciate it.

From what I understand from documentation and other Stack Overflow conversations, the allocateDirect allocates space right on the drive, and mostly avoids using RAM. I have a concern that the ByteBuffer created with allocateDirect might have a potential to overflow or not be allocated if the File infile is large, say 1GB. I am guaranteed at this point in the development of our software that the File will be no larger than 2 GB; but there is potential in the future that it might be as big as 10 or 20GB.

I have observed that the transferFrom loop never goes through the loop more than once... so it seems to succeed in writing the entire infile at once; but I haven't tested it with files bigger than 60MB. I looped though, because the documentation specifies that there is no guarantee of how much will be written at once. With transferFrom only able to accept, on my system, an int32 as its count parameter, I won't be able to specify more than 2GB at a time be transferred... Again, kernel expertise would help me understand.

Thanks in advance for your help!!

Using a ByteBuffer:

boolean concatFiles(StringBuffer sb, File infile, File outfile) {

    FileChannel inChan = null, outChan = null;

    try {

        ByteBuffer buff = ByteBuffer.allocateDirect((int)(infile.length() + sb.length()));
        //write the stringBuffer so it goes in the output file first:
        buff.put(sb.toString().getBytes());

        //create the FileChannels:
        inChan  = new RandomAccessFile(infile,  "r" ).getChannel();
        outChan = new RandomAccessFile(outfile, "rw").getChannel();

        //read the infile in to the buffer:
        inChan.read(buff);

        // prep the buffer:
        buff.flip();

        // write the buffer out to the file via the FileChannel:
        outChan.write(buff);
        inChan.close();
        outChan.close();
     } catch...etc

}

Using trasferTo (or transferFrom):

boolean concatFiles(StringBuffer sb, File infile, File outfile) {

    FileChannel inChan = null, outChan = null;

    try {

        //write the stringBuffer so it goes in the output file first:    
        PrintWriter  fw = new PrintWriter(outfile);
        fw.write(sb.toString());
        fw.flush();
        fw.close();

        // create the channels appropriate for appending:
        outChan = new FileOutputStream(outfile, true).getChannel();
        inChan  = new RandomAccessFile(infile, "r").getChannel();

        long startSize = outfile.length();
        long inFileSize = infile.length();
        long bytesWritten = 0;

        //set the position where we should start appending the data:
        outChan.position(startSize);
        Byte startByte = outChan.position();

        while(bytesWritten < length){ 
            bytesWritten += outChan.transferFrom(inChan, startByte, (int) inFileSize);
            startByte = bytesWritten + 1;
        }

        inChan.close();
        outChan.close();
    } catch ... etc
Tomislav Nakic-Alfirevic
  • 10,017
  • 5
  • 38
  • 51
GLaDOS
  • 683
  • 1
  • 14
  • 29
  • 2
    Note, I just changed it to use startByte instead of outChan.position() within the while loop. My previous understanding was that everytime that outChan.transferFrom ran, outChan.position() would return the position of the latest write. It turned out, when we were using this with large files and it was looping more than once, the outChan.position() within the transferFrom() parameters was returning the original position, not the updated one. So we are now using a separate variable to hold the next position. – GLaDOS Oct 18 '11 at 18:03

1 Answers1

3

transferTo() can be far more efficient as there is less data copying, or none if it can all be done in the kernel. And if it isn't on your platform it will still use highly tuned code.

You do need the loop, one day it will iterate and your code will keep working.

user207421
  • 305,947
  • 44
  • 307
  • 483
  • I see... although allocateDirect allocates space directly on the disk, which is better for JVM memory management, the data is still copied from one place to the other. On the other hand, transferTo/From may actually avoid copying those bytes altogether, letting the kernel decide whether the data needs to be actively copied from source to destination, or whether it can just change its information about the location of the bytes on the disk. Is that correct? – GLaDOS Jun 06 '11 at 17:08
  • 1
    @GLaDOS allocateDirect() does not 'allocate space directly on the disk'. You are confusing direct buffers with mapped buffers. – user207421 Oct 18 '11 at 18:08
  • thanks; So, from what I read after this comment, a mapped buffer is a type of direct buffer representing a memory-mapped region of a file. Looks like I still have a lot to learn. – GLaDOS Oct 18 '11 at 18:45
  • By the way, of course you were right about the "one day it will iterate" and it revealed a misunderstanding on my part about FileChannel position(). It seems that when I read documentation the first time round it doesn't sink in: "This method does not modify this channel's position." – GLaDOS Oct 18 '11 at 18:48
  • ... that is, transferFrom does not modify the position... :-| – GLaDOS Oct 18 '11 at 18:54