21

I'm trying to read a binary file from a URLConnection. When I test it with a text file it seems to work fine but for binary files it doesn't. I'm using the following mime-type on the server when the file is send out:

application/octet-stream

But so far nothing seems to work. This is the code that I use to receive the file:

file = File.createTempFile( "tempfile", ".bin");
file.deleteOnExit();

URL url = new URL( "http://somedomain.com/image.gif" );

URLConnection connection = url.openConnection();

BufferedReader input = new BufferedReader( new InputStreamReader( connection.getInputStream() ) );

Writer writer = new OutputStreamWriter( new FileOutputStream( file ) );

int c;

while( ( c = input.read() ) != -1 ) {

   writer.write( (char)c );
}

writer.close();

input.close();
Luke
  • 20,878
  • 35
  • 119
  • 178

2 Answers2

35

This is how I do it,

input = connection.getInputStream();
byte[] buffer = new byte[4096];
int n;

OutputStream output = new FileOutputStream( file );
while ((n = input.read(buffer)) != -1) 
{
    output.write(buffer, 0, n);
}
output.close();
Flow
  • 23,572
  • 15
  • 99
  • 156
ZZ Coder
  • 74,484
  • 29
  • 137
  • 169
15

If you are trying to read a binary stream, you should NOT wrap the InputStream in a Reader of any kind. Read the data into a byte array buffer using the InputStream.read(byte[], int, int) method. Then write from the buffer to a FileOutputStream.

The way you are currently reading/writing the file will convert it into "characters" and back to bytes using your platform's default character encoding. This is liable to mangle binary data.

(There is a charset (LATIN-1) that provides a 1-to-1 lossless mapping between bytes and a subset of the char value-space. However this is a bad idea even when the mapping works. You will be translating / copying the binary data from byte[] to char[] and back again ... which achieves nothing in this context.)

Stephen C
  • 698,415
  • 94
  • 811
  • 1,216
  • Or you can try wrapping up your InputStream into BufferedInputStream. – bhups Jul 11 '10 at 06:33
  • 2
    @bhups - that is true, but it will only help if you are going to do lots of small reads. If you exclusively do large block reads, a BufferedInputStream will actually reduce throughput a bit. – Stephen C Jul 11 '10 at 06:42
  • 1
    This is correct; `InputStreamReader` will transform byte data to UTF-16 character data (in this case, using the default platform encoding, which is a bad idea even for text/plain). A Java char is not an octet as it is in some other languages. – McDowell Jul 11 '10 at 09:25
  • @StephenC, regarding your last (+1 useful) comment - What buffer-size would still be considered as causing "lots of small reads" (by your definition)? In other words, how small "should" the `byte[]` read-buffer be, to justify usage of `BufferedInputStream`? – Bliss Jul 14 '19 at 10:26
  • I can't give you an exact number. It depends on the relative costs of a syscall, the sizes of the buffer and the `byte[]`, and so on. But my real point is to not *assume* that using a buffered stream always makes things faster. – Stephen C Dec 23 '20 at 10:19