0

I'm trying to write compressed data to a file and then read in the data and decompress it using the GZIP library. I've tried changing all formatting to StandardCharsets.UTF-8 and ISO-8859-1 and neither have fixed the GZIP format error. I'm wondering if it could possible have to do with the file I'm reading in? Here's the compression function:

public static byte[] compress(String originalFile, String compressFile) throws IOException {

    // read in data from text file
    // The name of the file to open.
    String fileName = originalFile;

    // This will reference one line at a time
    String line = null;
    String original = "";

    try {
        // FileReader reads text files in the default encoding.
        FileReader fileReader = 
            new FileReader(fileName);

        // Always wrap FileReader in BufferedReader.
        BufferedReader bufferedReader = 
            new BufferedReader(fileReader);

        while((line = bufferedReader.readLine()) != null) {
            original.concat(line);
        }   

        // Always close files.
        bufferedReader.close();         
    }
    catch(FileNotFoundException ex) {
        System.out.println(
            "Unable to open file '" + 
            fileName + "'");                
    }
    catch(IOException ex) {
        System.out.println(
            "Error reading file '" 
            + fileName + "'");                  
        // Or we could just do this: 
        // ex.printStackTrace();
    }


    // create a new output stream for original string
    try (ByteArrayOutputStream out = new ByteArrayOutputStream())
    {
        try (GZIPOutputStream gzip = new GZIPOutputStream(out))
        {
            gzip.write(original.getBytes(StandardCharsets.UTF_8));
        }
        byte[] compressed = out.toByteArray();
        out.close();

        String compressedFileName = compressFile;

        try {
            // Assume default encoding.
            FileWriter fileWriter =
                new FileWriter(compressedFileName);

            // Always wrap FileWriter in BufferedWriter.
            BufferedWriter bufferedWriter =
                new BufferedWriter(fileWriter);

            // Note that write() does not automatically
            // append a newline character.
            String compressedStr = compressed.toString();
            bufferedWriter.write(compressedStr);

            // Always close files.
            bufferedWriter.close();
        }
        catch(IOException ex) {
            System.out.println(
                "Error writing to file '"
                + fileName + "'");
            // Or we could just do this:
            // ex.printStackTrace();
        }
        return compressed;
    }
}

(I'm receiving the error on the line in the following decompression function) -

GZIPInputStream compressedByteArrayStream = new GZIPInputStream(new ByteArrayInputStream(s.getBytes(StandardCharsets.UTF_8)));

Decompression Function:

 public static String decompress(String file) throws IOException {

    byte[] compressed = {};
    String s = "";

    File fileName = new File(file);
    FileInputStream fin = null;
    try {
        // create FileInputStream object
        fin = new FileInputStream(fileName);

        // Reads up to certain bytes of data from this input stream into an array of bytes.
        fin.read(compressed);
        //create string from byte array
        s = new String(compressed);
        System.out.println("File content: " + s);
    }
    catch (FileNotFoundException e) {
        System.out.println("File not found" + e);
    }
    catch (IOException ioe) {
        System.out.println("Exception while reading file " + ioe);
    }
    finally {
        // close the streams using close method
        try {
            if (fin != null) {
                fin.close();
            }
        }
        catch (IOException ioe) {
            System.out.println("Error while closing stream: " + ioe);
        }
    }


    // create a new input string for compressed byte array
    GZIPInputStream compressedByteArrayStream = new GZIPInputStream(new ByteArrayInputStream(s.getBytes(StandardCharsets.UTF_8)));
    ByteArrayOutputStream byteOutput = new ByteArrayOutputStream();

    byte[] buffer = new byte[8192];

    // create a string builder and byte reader for the compressed byte array
    BufferedReader decompressionBr = new BufferedReader(new InputStreamReader(compressedByteArrayStream, StandardCharsets.UTF_8));
    StringBuilder decompressionSb = new StringBuilder();

    // write data to decompressed string
    String line1;
    while((line1 = decompressionBr.readLine()) != null) {
        decompressionSb.append(line1);
    }
    decompressionBr.close();

    int len;
    String uncompressedStr = "";
    while((len = compressedByteArrayStream.read(buffer)) > 0) {
        uncompressedStr = byteOutput.toString();
    }

    compressedByteArrayStream.close();  
    return uncompressedStr;
}

Here's the error message that i am receiving:

[B@7852e922
File content: 
java.io.EOFException
    at java.util.zip.GZIPInputStream.readUByte(GZIPInputStream.java:268)
    at java.util.zip.GZIPInputStream.readUShort(GZIPInputStream.java:258)
    at java.util.zip.GZIPInputStream.readHeader(GZIPInputStream.java:164)
    at java.util.zip.GZIPInputStream.<init>(GZIPInputStream.java:79)
    at java.util.zip.GZIPInputStream.<init>(GZIPInputStream.java:91)
    at org.kingswoodoxford.Compression.decompress(Compression.java:136)
    at org.kingswoodoxford.Compression.main(Compression.java:183)

Any suggestions as to how I might be able to fix this?

Trey Taylor
  • 103
  • 2
  • 12
  • is your original file really utf 8? – Kalpesh Soni Dec 03 '15 at 17:37
  • Don't convert to a string. There's no need, and most character sets will not handle all possible bytes. And if you're not using strings, then there's no need to use `Reader` and `Writer`; use `InputStream` and `OutputStream`. – kdgregory Dec 03 '15 at 18:11

1 Answers1

0

When you read the file you discard the new line at the end of each line.

A more efficient option which does do this is to copy a block i.e. char[] at a time. You can also convert the text as you go rather than creating a String or a byte[].

BTW original.concat(line); returns the concatenated string which you are discarding.

The real problem is you write to one stream and close a different one. This means that if there is any buffered data at the end of the file (and this is highly likely) the end of the file will be truncated and when you read it it will complain that your file is incomplete or EOFException.

Here is a shorter example

public static void compress(String originalFile, String compressFile) throws IOException {
    char[] buffer = new char[8192];
    try (
            FileReader reader = new FileReader(originalFile);
            Writer writer = new OutputStreamWriter(
                    new GZIPOutputStream(new FileOutputStream(compressFile)));
    ) {
        for (int len; (len = reader.read(buffer)) > 0; )
            writer.write(buffer, 0, len);
    }
}

In the decompress, don't encode binary as text and attempt to get back the same data. It will almost certainly be corrupted. Try to use a buffer and a loop like I did for compress. i.e. it shouldn't be any more complicated.

Peter Lawrey
  • 525,659
  • 79
  • 751
  • 1,130
  • Thanks, so I believe I fixed the stream closing problem by closing my GZIPOutputStream as well as my ByteArrayOutputStream. Regarding the read of the file, I'm not quite sure how I'm discarding the new line at the end of each line and how I can fix this. – Trey Taylor Dec 03 '15 at 17:25
  • @TreyTaylor `BufferedReader.readLine()` doesn't include the new line characters. See my answer for how you should do it. – Peter Lawrey Dec 03 '15 at 17:27
  • @TreyTaylor thinking about it, you should really learn something here because you appear to have combined nearly every common mistake I can think of. ;) Though you didn't use StringBuffer at least ^_^ – Peter Lawrey Dec 03 '15 at 17:30
  • 1
    Sorry, I'm a beginner in Java. But either way I ended up writing a function that read the compressed data into a byte array and then decompressing the returned compressed data and writing out the decompressed string. All in about 60 lines rather than 250 :) – Trey Taylor Dec 03 '15 at 17:51
  • @TreyTaylor An expert is some one who has made all the mistakes before. ;) – Peter Lawrey Dec 03 '15 at 18:06