1

I am trying to read a binary file with Python. This is the code I use:

fb = open(Bin_File, "r")
a = numpy.fromfile(fb, dtype=numpy.float32)

However, I get zero values at the end of the array. For example, for a case where nrows=296 and ncol=439 and as a result, len(a)=296*439, I get zero values for a[-922:]. I know these values should be noData (-9999 in this example) from a trusted piece of code in R. Does anybody know why I am getting these non-sense zeros?

P.S: I am not sure it is related on not, but len(a) is nrows*ncols+2! I have to get rid of these two using a = a[0:-2] so that when I reshape them into rows and columns using a_reshape = a.reshape(nrows, ncols) I don't get an error.

ahoosh
  • 1,340
  • 3
  • 17
  • 31
  • 1
    try opening with `"rb"` tag instead of `"r"` ? – Gabriel Jul 28 '14 at 21:04
  • hmmm, you should probably tag this question with the R tag and post your R read commands or the code that actually wrote the file. – Gabriel Jul 28 '14 at 21:06
  • maybe the software that wrote the file adds 2 extra fields above and beyond the raw binary? I know (by default) Fortran 90 adds two blocks that indicate how much data is there. – Gabriel Jul 28 '14 at 21:08
  • @Gabriel Using `"rb"` instead of `"r"` solved all of the problems. The numpy array now totallt makes sense. Do you mind moving your comment to answer so that I can vote it up? – ahoosh Jul 28 '14 at 21:14
  • added answer and some explanation – Gabriel Jul 28 '14 at 21:21

1 Answers1

2

When opening a file for reading as binary you should use the mode "rb" instead of "r".

Here is some background from the docs. On linux machines you don't need the "b" but it wont hurt. On Windows machines you must use "rb" for binary files.

Also note that the two extra entries you're getting is a common bug/feature when using the "unformatted" binary output format of Fortran. Each write statement given in this mode will produce a record that is surrounded by two 4 byte blocks.

These blocks represent integers that list the number of bytes in the block of unformatted data. For example, [223] [223 bytes of data] [223].

Gabriel
  • 10,524
  • 1
  • 23
  • 28
  • Awesome! It totally worked. Using `"rb"` instead of `"b"` solved the problems with non-sense zero. The binary code was created using Fortran and it still has two numbers more than `ncols*nrows` as you mentioed. I used `a = a[0:-2]` to get rid of them. – ahoosh Jul 28 '14 at 21:53
  • thanks, you can accept the answer by clicking on the green arrow on the left. – Gabriel Jul 28 '14 at 22:05
  • actually, be careful removing the last two. Fortran will add a 4 byte int at the beginning and one at the end. These indicate how big the data block is and can be used for verification. you probably want `a = a[1:-1]`. – Gabriel Jul 28 '14 at 22:06
  • That's a very good point. I checked the binary file and it seems that there are two very small numbers at `a[0]` and `a[-1]`. Using `a = a[1:-1]` is the way to go. – ahoosh Jul 29 '14 at 16:14