9

So, I'm dealing with integrating a legacy system. It produces a large text file, that prints instructions in one large string. Really large string. We're talking 450 000 characters or more.

I need to break this up in to lines, one per instruction. Each instruction is separated by a five digit code, where the code contains the number of characters in the next instruction.

My solution is writing a small java program that uses a buffered reader to read the file into a string, which is subsequently split into lines, and saved to a new file.

Any advice on handling this? Will a buffered reader be able to read this into a regular string? Am i doing this wrong?

Community
  • 1
  • 1
Eric Olsvik
  • 103
  • 4
  • `BufferedReader` should be able to read the data. – luiges90 Apr 17 '15 at 13:22
  • I would use `StringBuilder` – brso05 Apr 17 '15 at 13:22
  • Actually I would process the file in chunks rather than putting it all into a `String` or `StringBuilder` if performance becomes an issue otherwise I would just load it all into a `StringBuilder`. – brso05 Apr 17 '15 at 13:23
  • 3
    A clever solution would use input and output streams, and avoid reading the whole file into memory. – Bathsheba Apr 17 '15 at 13:25
  • @brso05 how would you split the file into chunks? – Eric Olsvik Apr 17 '15 at 13:48
  • @EricOlsvik - As bathsheba says - Use `fileInputStream#read(byte b[], int off, int len)` and keep reading bytes and converting them to String. PArt by part until you reach the end of input. – TheLostMind Apr 17 '15 at 13:50
  • @EricOlsvik just read so many bytes then process then read some more bytes then process. You said it is broken up by a 5 digit identifier just use regex to split then if there is anything left over just save the part that hasn't been processed and dump the rest before reading more bytes. – brso05 Apr 17 '15 at 13:51
  • The Grep example on the official Java tutorials website (http://docs.oracle.com/javase/7/docs/technotes/guides/io/example/index.html) is very similar to what you are trying to do - it uses NIO, memory mapped files and regular expressions – tonys Apr 17 '15 at 13:52
  • @EricOlsvik first I would try reading the whole file if there isn't a performance issue than don't worry about it... – brso05 Apr 17 '15 at 13:52

1 Answers1

3

Yes. Use a buffered reader.

Work out the max size of an instruction and create a char[] of that size. Then do something like:

 reader.read(charArray, 0, 5);

 // parse the header

 reader.read(charArray, 0, lengthOfInstruction);

 String instruction = new String(charArray, 0, lengthOfInstruction);

 // do stuff with the instruction

You put this in a while loop that terminates when the file ends.

This might not be the most run-time efficient, but it's probably good enough and will be simple enough to get working.

Ashley Frieze
  • 4,993
  • 2
  • 29
  • 23