6

I have a simple binary file that contains 32-bit floats adjacent to each other.

Using Julia, I would like to read each number (i.e. each 32-bit word) and put them each sequentially into a array of Float32 format.

I've tried a few different things through looking at the documentation, but all have yielded impossible values (I am using a binary file with known values as dummy input). It appears that:

  1. Julia is reading the binary file one-byte at a time.

  2. Julia is putting each byte into a Uint8 array.

For example, readbytes(f, 4) gives a 4-element array of unsigned 8-bit integers. read(f, Float32, DIM) also gives strange values.

Anyone have any idea how I should proceed?

William
  • 521
  • 7
  • 21

3 Answers3

8

I'm not sure of the best way of reading it in as Float32 directly, but given an array of 4*n Uint8s, I'd turn it into an array of n Float32s using reinterpret (doc link):

raw = rand(Uint8, 4*10)  # i.e. a vector of Uint8 aka bytes
floats = reinterpret(Float32, raw)  # now a vector of 10 Float32s

With output:

julia> raw = rand(Uint8, 4*2)
8-element Array{Uint8,1}:
 0xc8
 0xa3
 0xac
 0x12
 0xcd
 0xa2
 0xd3
 0x51

julia> floats = reinterpret(Float32, raw)
2-element Array{Float32,1}:
 1.08951e-27
 1.13621e11
IainDunning
  • 11,546
  • 28
  • 43
8

(EDIT 2020: Outdated, see newest answer.) I found the issue. The correct way of importing binary data in single precision floating point format is read(f, Float32, NUM_VALS), where f is the file stream, Float32 is the data type, and NUM_VALS is the number of words (values or data points) in the binary data file.

It turns out that every time you call read(f, [...]) the data pointer iterates to the next item in the binary file.

This allows people to be able to read in data line-by-line simply:

f = open("my_file.bin")
first_item = read(f, Float32)
second_item = read(f, Float32)
# etc ...

However, I wanted to load in all the data in one line of code. As I was debugging, I had used read() on the same file pointer several times without re-declaring the file pointer. As a result, when I experimented with the correct operation, namely read(f, Float32, NUM_VALS), I got an unexpected value.

William
  • 521
  • 7
  • 21
1

Julia Language has changed a lot since 5 years ago. read() no longer has API to specify Type and length simultaneously. reinterpret() creates a view of a binary array instead of array with desired type. It seems that now the best way to do this is to pre-allocate the desired array and fill it with read!:

data = Array{Float32, 1}(undef, 128)
read!(io, data)

This fills data with desired float numbers.

chunjiw
  • 1,093
  • 9
  • 20