0

I have been banging my head against a wall on this all day. I have a PDF file that we generate. The PDF file looks fine in Acrobat.

I need to encode the file in base64. Using Apache codec library I do this:

String base64buf = Base64.encodeBase64String( m_reportText.getBytes( "UTF-8" ) );

As a test I write base64buf out to a file:

Files.write( new File( "report.b64" ).toPath(), base64buf.getBytes( "UTF-8") );

Then I convert it back, just to see if it is working:

String encodedName = "report.b64";
String decodedName = "report.pdf";

// Read original file.
byte[] encodedBuffer = Files.readAllBytes( new File( encodedName ).toPath() );

// Decode
byte[] decodedBuffer = Base64.decodeBase64( encodedBuffer );

// Write out decodedBuffer.
FileOutputStream outputStream = new FileOutputStream( decodedName );
outputStream.write( decodedBuffer );
outputStream.close();

I open report.pdf in Acrobat and it is a blank document. It has the correct number of pages (all are blank).

What am I missing here?

mkl
  • 90,588
  • 15
  • 125
  • 265
Mike Dee
  • 558
  • 1
  • 5
  • 13
  • 1
    Do I interpret your code correctly when thinking `m_reportText` is a member variable containing the PDF, and this member variable is a `String`? If that is the case, you are in trouble anyways as PDF files are binary files, not text files (even if they partially look textual). And your UTF-8 encoding of the contents of that variable (not the base64 string, though) will break it even more. – mkl Jul 29 '15 at 09:16

1 Answers1

1

m_reportText is a String, hence contains Unicode text. However a PDF is in general binary data. That should really be avoided, as the superfluous conversion in both directions is lossy and error prone. For a hack you could try storing and retrieving the PDF bytes as ISO-8859-1.

Use a byte[] m_reportText.

Joop Eggen
  • 107,315
  • 7
  • 83
  • 138
  • You and mkl are correct in that m_reportText is the PDF. I looked at how m_reportText was generated and it is in binary form and then turned into text via a toString( "iso-8859-1" ). Changing the "UTF-8" to "iso-8859" (in the above) code worked. Thanks. – Mike Dee Jul 29 '15 at 16:21
  • 1
    @MikeDee That's what Joop mentioned as hack (it really **is** merely a hack, storing PDFs in a `String` indicates a flawed software architecture); thus, you should accest it as correct answer. – mkl Jul 31 '15 at 08:39
  • Even though it works, it does two (ISO-8859-1) conversions to and from String which can be made faster by a `byte[]` maybe via a `ByteArrayOutputStream`.to create the PDF. – Joop Eggen Jul 31 '15 at 08:43