1

I have some time data which starts at T0 and goes to T1 in steps of dt. This data is in small increments of dt and is currently stored as a numpy array and as such uses a lot of space. A more efficient way to store this would be to store T0, T1 and dt, e.g. using a generator. However generators don't work with many functions, e.g. numpy functions, arithmetic and plotting. I want something which works like a generator, i.e. only stores the 3 necessary values, and then generates a numpy array when necessary to use for use in some function or so.

Is there any object that already exists that works like this? I.e. Storing the necessary data (3 values) in a generator and then returning/representing itself as a numpy array when used in functions or arithmetic? The memory would only then be used while within the scope of the function and be released when it falls out of scope.

EDIT with solution: I created an mplementation what I wanted. Copying generators turned out to be tricky, see here, so I instead store the start, stop and step and create and return generators or numpy arrays as required.

The code for this is as follows:

import numpy as _np

class frange():
    """
    Return an object can be used to generate a generator or an array 
    of floats from start (inclusive) to stop (exclusive) by step. 
    This object stores the start, stop, step and length of
    the data. Uses less memory than storing a large array.

    Example
    -------
    An example of how to use this class to generate some data is
    as follows for some time data between 0 and 2 in steps of
    1e-3 (0.001)::

        $ time = frange(0, 2, 1e-3)

        $ printlen(time) # prints length of frange, just like an array or list

        $ generator = time.get_generator() # gets a generator instance
        $ for i in generator: # iterates through printing each element
        $     print(i)

        $ array = time.get_array() # gets an array instance
        $ newarray = 5 * array # multiplies array by 5


    """
    def __init__(self, start, stop, step):
        """
        Intialises frange class instance. Sets start, top, step and 
        len properties.

        Parameters
        ----------
        start : float
            starting point
        stop : float
            stopping point 
        step : float
            stepping interval
        """
        self._slice = slice(start, stop, step)
        self.len = self.get_array().size
        return None

    def get_generator(self):
        """
        Returns a generator for the frange object instance.

        Returns
        -------
        gen : generator
            A generator that yields successive samples from start (inclusive)
            to stop (exclusive) in step steps.
        """
        s = self._slice
        gen = drange(s.start, s.stop, s.step) # intialises the generator
        return gen

    def get_array(self):
        """
        Returns an numpy array containing the values from start (inclusive)
        to stop (exclusive) in step steps.

        Returns
        -------
        array : ndarray
            Array of values from start (inclusive)
            to stop (exclusive) in step steps.
        """
        s = self._slice        
        array = _np.arange(s.start, s.stop, s.step)
        return array

    def __len__(self):
        return self.len      

def drange(start, stop, step):
    """
    A generator that yields successive samples from start (inclusive)
    to stop (exclusive) in step intervals.

    Parameters
    ----------
    start : float
        starting point
    stop : float
        stopping point 
    step : float
        stepping interval

    Yields
    ------
    x : float
        next sample
    """
    x = start
    if step > 0:
        while x + step <= stop: # produces same behaviour as numpy.arange
            yield x
            x += step
    elif step < 0:
        while x + step >= stop: # produces same behaviour as numpy.arange
            yield x
            x += step
    else:
        raise ZeroDivisionError("Step must be non-zero")
SomeRandomPhysicist
  • 1,531
  • 4
  • 19
  • 42
  • [This](https://stackoverflow.com/questions/367565/how-do-i-build-a-numpy-array-from-a-generator) answer might adress your question. – Tom Wyllie Jun 30 '17 at 20:47
  • If you're using python 3, then you already get this by using `range`. However, that's only for ints. It gets more complicated if you want to do something like this with floats. – Dunes Jun 30 '17 at 21:09
  • Indeed, I am wanting some functionality similar to range, but for floats and that works in numpy functions. – SomeRandomPhysicist Jun 30 '17 at 21:16
  • You should create a [mcve] and bring that code into the question. This is so the question can stand on its own in case that link at some point in the future dies. – Bugs Jun 30 '17 at 23:46

3 Answers3

3

Python already has a class that stores start, stop, step attributes, a slice

In [523]: s = slice(0, 1, .1)

np.lib.index_tricks has a class that can expand slices. In this case it uses arange:

In [524]: np.r_[s]
Out[524]: array([ 0. ,  0.1,  0.2,  0.3,  0.4,  0.5,  0.6,  0.7,  0.8,  0.9])
In [525]: np.arange(s.start, s.stop, s.step)
Out[525]: array([ 0. ,  0.1,  0.2,  0.3,  0.4,  0.5,  0.6,  0.7,  0.8,  0.9])
In [526]: np.arange(0, 1, .1)
Out[526]: array([ 0. ,  0.1,  0.2,  0.3,  0.4,  0.5,  0.6,  0.7,  0.8,  0.9])

slice just stores its atttributes; any calculation is done by the code that uses it. np.r_ uses this trick to invoke np.linspace if the step value is imaginary.

In [527]: np.r_[slice(0,1,11j)]
Out[527]: array([ 0. ,  0.1,  0.2,  0.3,  0.4,  0.5,  0.6,  0.7,  0.8,  0.9,  1. ])

I don't see how the generators discussed in the other answer are an improvement over running arange or linspace on the fly.

If you are developing your own indexing classes, it will be worth you while to study the index_tricks.py file.

hpaulj
  • 221,503
  • 14
  • 230
  • 353
  • I didn't know about slices, they're definitely a possible solution. Although they have the downside that they are not iterable. – SomeRandomPhysicist Jun 30 '17 at 23:22
  • By itself a `slice` is not iterable. It's away of storing these 3 values. Normally the python interpreter translates `x[1:3:2]` to `x.__getitem__(slice(1,3,2))`. In other words, it's a way of storing the elements of `1:2:3` expression. – hpaulj Jul 01 '17 at 01:11
1

Every generator can be turned to a list or a numpy array with list comprehension, like so:

def mygenerator(T0, T1, dt):
    while T0<T1:
        T0 += dt
        yield T0

def gen2numpy(gen):
    return np.array([item for item in gen])

I used a very naive generator that assumes much about T0, T1, dt but that is because you provided no code that would help with those assumptions... Either way, look at how I turned a generator to a numpy array

If you are looking for a single object that can do both, define a class and use those functions as it's methods

But be careful - iterating over the generator, even partially, and only then generating the list will give only a partial list or an error. If you have to iterate before creating the list, I would recommend creating 2 identical (but seperate) generators, one for iteration and the other one in case a complete list is needed later...

Ofer Sadan
  • 11,391
  • 5
  • 38
  • 62
  • I planned to create a class that implements exactly this, however I wanted to check there was no in-built class that provided this functionality in a more robust and efficient manner. I assume it is the case that this doesn't exist? – SomeRandomPhysicist Jun 30 '17 at 21:05
  • 1
    Not that I know of, but like you see turning a generator to a list is a one-liner either way and it is as efficient as possible, but I have added a warning to my answer as well... – Ofer Sadan Jun 30 '17 at 21:11
  • Ahh, I see the issue, maybe a solution would be having a protected private copy of the iterator that the user cannot access and each time the class is accessed returning a fresh iterator for the user to use. – SomeRandomPhysicist Jun 30 '17 at 21:14
  • I created an minimal implementation of this [here](https://gist.github.com/AshleySetter/73d8adfea4a919db4fd3c6c60342f609). Copying generators turned out to be tricky, see [here](https://stackoverflow.com/questions/21315207/deep-copying-a-generator-in-python/21315536#21315536), so I instead store the start, stop and step and create and return generators or numpy arrays as required. – SomeRandomPhysicist Jun 30 '17 at 22:02
1

A slight improvement over using raw slices would be to create a class that can be coerced to array with __array__:

class range_array(object):
    def __init__(*args):
        self._slice = slice(*args)

    def __array__(self):
        s = self._slice
        return np.arange(s.start, s.stop, s.step)

Which means that code like this will work:

a = range_array(T0, T1, dt)
res = np.dot(a, a)

You could go a bit further and implement __array_ufunc__ in numpy 1.13:

class range_array(np.lib.mixins.NDArrayOperatorsMixin):
    def __init__(start, stop, step):
        sl = slice(*args)
        self._start = sl.start
        self._stop = sl.stop
        self._step = sl.step

    def __array__(self):
        return np.arange(self._start, self._stop, self._step)

    def __array_ufunc__(self, ufunc, method, args, kwargs):
        # special case np.add(range, 1) to just add to stop and start, etc
Eric
  • 95,302
  • 53
  • 242
  • 374
  • I couldn't get your __array__(self) method to work. When I perform it I get the following: – SomeRandomPhysicist Jul 01 '17 at 01:45
  • ```In [9]: class range_array(object): ...: def __init__(self, *args): ...: self._slice = slice(*args) ...: ...: def __array__(self): ...: s = self._slice ...: return np.arange(s.start, s.stop, s.step) ...: In [10]: a = range_array(0, 10, 0.1) ...: res = np.dot(a, a) ...:``` – SomeRandomPhysicist Jul 01 '17 at 01:45
  • ```--------------------------------------------------------------------------- TypeError Traceback (most recent call last) in () 1 a = range_array(0, 10, 0.1) ----> 2 res = np.dot(a, a) 3``` – SomeRandomPhysicist Jul 01 '17 at 01:46