0

At the company I work, we have a job that retrieves emails, gets their attachments and saves them. Until now it only had to work with .xml and .txt files and it worked well.

We use the JavaMail 1.4.4 package. Existing code(modified to be more simpler. Don't mind the type checks):

Message message = ...;
MultiPart mp = (MultiPart)message.getContent();
File file = new File(newFileName);
Part part = mp.getBodyPart(indexWhereIsAttachement);
InputStream inputStream = part.getInputStream();
BufferedReader reader = new BufferedReader(new InputStreamReader(inputStream));
BufferedWriter writer  = new BufferedWriter(new FileWriter(file));
//method that read all from reader and writes to writer

When I use a .xls file, it doesn't work. This creates a corrupted .xls file. I can't open it with LibreOffice, neither can I open it as a Apache WorkBook in code. But it works for .xml and .txt.

But if I do this:

...
File file = new File(newFileName);
Part part = mp.getBodyPart(indexWhereIsAttachement);
((MimeBodyPart)part).saveFile(file);

It works fine. Looking at the "saveFile()" method, it uses a BufferedInput(Output)Stream. So while reading the file, it doesn't convert the data to characters. Is this what's causing the issues? What exactly happens, that breaks everything?

kozeljko
  • 160
  • 2
  • 13
  • 2
    might be exactly what you think. `.xls` is basically a `.zip` file while `.txt` and `.xml` files are both purely text based and therefore can easily be converted to characters – XtremeBaumer Jan 24 '18 at 13:38
  • 2
    @XtremeBaumer Just a small addendum - I believe ```.xlsx``` files (newer MS Office format) are zip-based while ```.xls``` aren't. Old ```.xls``` are binary files in proprietary MS format. – TheJavaGuy-Ivan Milosavljević Jan 24 '18 at 14:00
  • Yes, my bad. Only `.xlsx` files are zip based. @TheJavaGuy-IvanMilosavljević 's comment is true. Ignore that part of mine – XtremeBaumer Jan 24 '18 at 14:02
  • No wonder that conversion to character is bad. Thanks guys. – kozeljko Jan 24 '18 at 14:04
  • 2
    I would argue that using the constructor of `InputStreamReader` that doesn't explicitly take a charset is almost unequivocally a bug. The "platform default charset" is practically never what you want, and definitely never something you should use implicitly -- always make it explicit. (And similarly for other Reader/Writer classes that use the platform default charset implicitly.) – Daniel Pryden Jan 24 '18 at 14:06
  • I'm having trouble finding an exact duplicate, but this one is close: https://stackoverflow.com/questions/22115246/reliance-on-default-encoding-what-should-i-use-and-why – Daniel Pryden Jan 24 '18 at 14:16

0 Answers0