1

I am trying to update some legacy code that uses np.fromfile in a method. When I try searching the numpy source for this method I only find np.core.records.fromfile, but when you search the docs you can find np.fromfile. Taking a look at these two methods you can see they have different kwargs which makes me feel like they are different methods altogether.

My questions are:

1) Where is the source for np.fromfile located?

2) Why are there two different functions under the same name? This can clearly get confusing if you aren't careful about the difference as the two behave differently. Specifically np.core.records.fromfile will raise errors if you try to read more bytes than a file contains while np.fromfile does not. You can find a minimal example below.

In [1]: import numpy as np

In [2]: my_bytes = b'\x04\x00\x00\x00\xac\x92\x01\x00\xb2\x91\x01'

In [3]: with open('test_file.itf', 'wb') as f:
            f.write(my_bytes)

In [4]: with open('test_file.itf', 'rb') as f:
            result = np.fromfile(f, 'int32', 5)

In [5]: result
Out [5]: 

In [6]: with open('test_file.itf', 'rb') as f:
            result = np.core.records.fromfile(f, 'int32', 5)
ValueError: Not enough bytes left in file for specified shape and type
Grr
  • 15,553
  • 7
  • 65
  • 85
  • Yes, you are linking to the source code for `np.core.records.fromfile`, which is specialized for dealing with record arrays. – juanpa.arrivillaga Mar 20 '17 at 18:25
  • @juanpa.arrivillaga I am aware of that. Unfortunately this is the only implemented declaration of fromfile I can find in the source. I would like to know where I can find the source for `np.fromfile` – Grr Mar 20 '17 at 18:31
  • The essence of the `records` version is to create a recipient array of correct dtype and shape, and use binary file `readinto` method to load the bytes into its `.data` attribute. – hpaulj Mar 20 '17 at 19:25
  • The `core.records` stuff is a `numpy` backwater. `structured arrays`, without the `records` overlay are more common,often produced by `genfromtxt` from `csv` files. – hpaulj Mar 20 '17 at 19:57

1 Answers1

4

If you use help on np.fromfile you will find something very... helpful:

Help on built-in function fromfile in module numpy.core.multiarray:

fromfile(...)
    fromfile(file, dtype=float, count=-1, sep='')

    Construct an array from data in a text or binary file.

    A highly efficient way of reading binary data with a known data-type,
    as well as parsing simply formatted text files.  Data written using the
    `tofile` method can be read using this function.

As far as I can tell, this is implemented in C and can be found here.

If you are trying to save and load binary data, you shouldn't use np.fromfile anymore. You should use np.save and np.load which will use a platform-independent binary format.

juanpa.arrivillaga
  • 88,713
  • 10
  • 131
  • 172
  • And to understand that `C` link you need to dig further and find the implementation for `PyArray_FromFile`. `numpy` builtins are a mystery to mere mortals. – hpaulj Mar 20 '17 at 19:50
  • @hpaulj heh, yeah that is so typical for numpy. `array_some_function` written in C is essentially a wrapper around `PyArray_SomeFunction` written in C.... good luck tracking down `PyArray_SomeFunction` – juanpa.arrivillaga Mar 20 '17 at 19:53