0

I'm trying to read a file, of MIME type "appliaction/octet-stream" line by line via a Java application running on a Linux PC. Clarification: "appliaction/octet-stream" was the result of running "file -ib file.txt" on Linux.

The file I'm trying to read was created on Windows XP.

I've called my file "file.txt".

On linux, "cat file.txt" displays the contents. "cat -v" as well as vim shows the control characters.

When I run code to iterate through it via my Java application (using simple BufferedReader(FileReader) type of code), my output is unexpected.

Any approached I should take? I tried converting the file using dos2unix, but no avail.

EDIT: the input file, when read through vim or "cat -v" is as follows:

[^@S^@y^@s^@t^@e^@m^@]^@^M^@ 

The line simply says "System", but it seems the control characters are rendering the file unreadable via my Java app.

UPDATE: I ran my code using all available Character encodings, and it turns out that the readable CharSets were "x-UTF-16LE-BOM" and "COMPOUND-TEXT". Thanks to everyone for their help.

  • 1
    Probably line endings. Linux/Unix use `\n`, Windows uses `\r\n`. Files themselves have no "mime type". mime is something that's wrapped AROUND files to explain what data type(s) is in them. – Marc B Jan 03 '14 at 16:20
  • Use the same encoding to write and read the file and it should work fine... – assylias Jan 03 '14 at 16:21
  • 1) Elaborate on "output is unexpected" and give more on what exactly the file contains, and perhaps some code... 2) What does this have to do with MIME? – TypeIA Jan 03 '14 at 16:21
  • This is most likely an encoding or linebreak issue. On Windows the default linebreak is `\r\n` and the default encoding is `Latin 1`. On most Linux systems the default linebreak is `\n` and the default encoding is `UTF-8`. – Boris the Spider Jan 03 '14 at 16:21
  • I should clarify, the unexpected output is that the lines it reads are all blank, but the input file clearly has text on every line. – cjtightpant Jan 03 '14 at 16:22
  • @dvnrrs, the input file, when read through vim or "cat -v" is as follows: `[^@S^@y^@s^@t^@e^@m^@]^@^M^@` The line simply says "System". – cjtightpant Jan 03 '14 at 16:24

1 Answers1

2

Looks like the file was written using the UTF-16 encoding. To read this in Java, you'll just need to specify that encoding in your reader:

InputStreamReader reader = new InputStreamReader(
    new FileInputStream(filename), Charset.forName("UTF-16"));
Community
  • 1
  • 1
TypeIA
  • 16,916
  • 1
  • 38
  • 52
  • I've tried the above, and here's my results: My InputStreamReader no longer recognizes new line characters, so it treats the entire file as one line. In addition, I still cannot read the file using new new encoding. – cjtightpant Jan 03 '14 at 17:41
  • Also, perhaps I was unclear about the MIME type above. What I meant was that when I ran "file -ib file.txt" on Linux, the result was "appliaction/octet-stream". – cjtightpant Jan 03 '14 at 17:44
  • @cjtightpant Are you trying to read the file you converted with `dos2unix`? Don't; that will corrupt a file encoded in this way. – TypeIA Jan 03 '14 at 17:59
  • see my latest update. I used the code you posted above with `x-UTF-16LE-BOM` as the Charset, and the text was readable. Thanks! – cjtightpant Jan 03 '14 at 18:14