2

I am trying to write a text compression program in java with huffman algorithm.I have already create the code for the encoding and i run my program from cmd like this: java Main inputFile outputFile where input is a txt file which contains 4 lines :

Hello
My name is
Panagiotis123
Nice to meet you!

I read line by line the txt and i want to create an compressed output file which will contains the text encoded:

111000110110100000100011011110001110100110100010100001000111111110011110100101010001011110111111101110101111110011101110011000111001001111001011000111010101101100001001001010011110101011110000111001

the input file contains 45 characters and the output file will contain 195 BITS so the first file will be approximately 45 bytes and the outputfile 195/8... i have try this :

 int b; 
 while ((line2 = br2.readLine()) != null) {
     String a = Encode.encode(line2, hTree);
     for(int k = 0; k < a.length(); k++) {
         b = a.charAt(k);
         fos.write((byte)b);
     }
 }

where a is a String which contains the encoded line. fos is created like this

FileOutputStream fos = new FileOutputStream(new File(args[1]));

The input file eventually is 54 bytes and the outputFile 198... Obviously it takes the 195 0/1 as characters and not as bit...

  1. what can i do?
  2. the String a how long can it be? i mean if there is a line with 8 words the encoded text may be more than 120 there would be a problem if i assign so many bits?
August
  • 12,410
  • 3
  • 35
  • 51
  • You can use a `DataOutputStream`. This has a method `writeByte` that will enable you to write your bits in chunks of 8. You can read the data back using a `DataInputStream`. Also you should probably be using `byte[]` rather than `String` to store your sequence of 0s and 1s. – Paul Boddington Dec 28 '14 at 15:00
  • First, you'll not be getting far if you independently encode line by line. Second, it's easy enough to transform 8 0/1 chars into an `int` value and use `write(int)` to write it to the output stream. – Marko Topolnik Dec 28 '14 at 15:02
  • OK thank you!But if in the last chunk there are no 8 bits but for example 5 can i write only 5? –  Dec 28 '14 at 15:03
  • No. You have to fill with 0s. – Paul Boddington Dec 28 '14 at 15:04
  • But then when i decode ,my program would try to find words from the encoded text there would be a problem if a fill it with 0s –  Dec 28 '14 at 15:06
  • You can get around that by first writing the number of 0s and 1s in the message as an `int`. – Paul Boddington Dec 28 '14 at 15:06
  • @MarkoTopolnik what do you mean that i will not get far? how can i read all the text together?is it better that way? –  Dec 28 '14 at 15:08
  • You should read input as a byte stream because Huffman encoding works on that, not on character data. You can read/write piecemeal, but you must maintain the encoder's state throughout. You can't start over on each new line. – Marko Topolnik Dec 28 '14 at 15:11
  • @MarkoTopolnik sorry for too many questions but im still a biginner.. how can i transform 8 0/1 chars into an int? can i transfrom less than 8 into an int? i am telling this beacause the last characters may not be 8 but for example 5... –  Dec 28 '14 at 15:16

1 Answers1

2

Try this, add some zeros until you have blocks of 8 bits, then byte by byte parse and write

int b;
while ((line2 = br2.readLine()) != null) {
    String a = Encode.encode(line2, hTree);

    while (a.length() % 8 != 0)
        a += "0"; // lets add some extra bits until we have full bytes
    for (int i = 0; i < a.length(); i += 8) {
        String byteString = a.substring(i, i + 8); // grab a byte
        int parsedByte = 0xFF & Integer.parseInt(byteString, 2);
        fos.write(parsedByte); // write a byte
    }

}