2

I'm implementing a huffman coding program. I encode and decode an entire book, so finding new line characters is very important, and something I overlooked, unfortunately. Currently, I use a method to read the small book into a String that is then returned, see below:

private String readFile(String filename) {
    String curLine;
    String toReturn = "";
    try {
        BufferedReader reader = new BufferedReader(new FileReader(filename));
        try {
            while ((curLine = reader.readLine()) != null) {
                toReturn += curLine;
            }
        } catch (IOException e) {
            System.out.println(e);
        }
    } catch (FileNotFoundException e) {
        System.out.println(e);
    }
    return toReturn;
}

This works great for regular characters, but not for things like a newline (I know there is a word for these types of "characters" but I'm blanking on it right now). Anyways, my question is how can I change the current method to also pickup newLine characters, or would I need to do something completely different. I suspect readLine() is gonna do me no good now because of this, but I wanted to check here for some input.

Everything else in my program works great, but the fact that newlines are not taken into account messes up my whole Huffman Tree and I'm sure you understand what happens from there. Any suggestions for what I can do to pickup newlines would be appreciated. Thanks!

WeekendJedi
  • 67
  • 10
  • 2
    Typically you want byte-oriented IO for this, not character-oriented. Think `FileInputStream`, maybe wrapped in a `BufferedInputStream`. – President James K. Polk Oct 21 '21 at 02:32
  • 1
    Generically 'characters' (really character _codes_ in a code like ASCII, EBCDIC, 8859, or Unicode) that aren't 'real' (printable or displayable) characters are widely called [**'control' characters**](https://en.wikipedia.org/wiki/Control_character). Formally the subset concerned with presentation like newline and tab (versus transmission, storage, or other things) are called 'format effectors', but you will not find that term used much. – dave_thompson_085 Oct 21 '21 at 02:55

3 Answers3

4

You can read all characters in a file like this.

static String readFile(String filename) throws IOException {
    return Files.readString(Path.of(filename));
}

It reads as UTF-8, but you can specify the encoding with the second argument of readString() if you want.

  • Or `readAllBytes` if you want bytes not chars (standard compression algorithms usually work on bytes so they aren't limited to text) – dave_thompson_085 Oct 21 '21 at 03:00
  • This option is only available Java 11 onwards. For a more complete answer to this question, please see my response! – Shivam Puri Oct 22 '21 at 06:38
2

As suggested in a comment above, you need a byte oriented approach for this.

From java 7 onwards, you can use the Files.readAllBytes(filepath) method. You can pass the entire file content as a string this way;

new String(Files.readAllBytes(filepath));

From java 11 you now have another option to use Files.readString(filepath) to do the same more beautifully.

You may refer to this blog for more detailed options to reading an entire file to String including BufferedReader; https://howtodoinjava.com/java/io/java-read-file-to-string-examples/

Shivam Puri
  • 1,578
  • 12
  • 25
0

I guess you are looking for Files.toString() from guava core library.

String content = Files.toString(new File("sample_file.txt"), Charsets.UTF_8);
Rohith Joseph
  • 603
  • 1
  • 5
  • 17