4

Python lists are pointers so I can do the following:

a = []
b = a

b.append(1)

>>> print a, b
[1] [1]

What is the way to accomplish this behavior using numpy? Because numpy's append creates a new array. That is:

a = np.array([])
b = a

b = np.append(b, 1)
>>> print a, b
[] [1.]

EDIT What I'm trying to accomplish:

I have a large text file which I'm trying to parse with re: Depending on a marker in the file, I want to change the array I'm appending to. For example:

x = np.array([])
y = np.array([])

with open("./data.txt", "r") as f:
    for line in f:
        if re.match('x values', line):
            print "reading x values"
            array = x
        elif re.match('y', line):
            print "reading y values"
            array = y
        else:
            values = re.match("^\s+((?:[0-9.E+-]+\s*)*)", line)
            if values:
                np.append(array, values.groups()[0].split())
Ben
  • 6,986
  • 6
  • 44
  • 71
  • 2
    You can't do that. Numpy arrays are allocated conecutively in memory, so they need to be reallocated if you want to resize them. Appending to them is inherently inefficient. Can you give a bit more context of your problem? – Sven Marnach Apr 25 '16 at 18:34
  • 2
    Possible duplicate of [How to extend an array in-place in Numpy?](http://stackoverflow.com/questions/13215525/how-to-extend-an-array-in-place-in-numpy) – wnnmaw Apr 25 '16 at 18:36
  • @SvenMarnach Sure, edited now. – Ben Apr 25 '16 at 18:37
  • 4
    @hansatz Using NumPy arrays in this way is really inefficient. Collect the data in lists, and build a Numpy array from the lists once you are done reading the file. – Sven Marnach Apr 25 '16 at 18:42
  • @SvenMarnach okay, that's good to know about the efficiency. I'll approach it in that way then – Ben Apr 25 '16 at 18:43
  • Are you aware that the last `np.append(...)` doesn't do anything for you, because you don't assign the return to anything? – hpaulj Apr 25 '16 at 19:21
  • @SvenMarnach If you could expand a bit more on the difference in efficiency, and why the code would have a quadratic runtime using numpy arrays as opposed to lists, I'd like to accept that answer. – Ben Apr 25 '16 at 19:35

3 Answers3

1

Based on your updated question, it looks like you can handily solve the problem by keeping a dictionary of numpy arrays:

x = np.array([])
y = np.array([])
Arrays = {"x": x, "y": y}

with open("./data.txt", "r") as f:
    for line in f:
        if re.match('x values', line):
            print "reading x values"
            key = "x"
        elif re.match('y', line):
            print "reading y values"
            key = "y"
        else:
            values = re.match("^\s+((?:[0-9.E+-]+\s*)*)", line)
            if values:
                Arrays[key] = np.append(Arrays[key], values.groups()[0].split())

As Sven Marnach points out in comments both here and your question, this is an inefficient use of numpy arrays.

A better approach (again, as Sven points out) would be:

Arrays = {"x": [], "y": []}

with open("./data.txt", "r") as f:
    for line in f:
        if re.match('x values', line):
            print "reading x values"
            key = "x"
        elif re.match('y', line):
            print "reading y values"
            key = "y"
        else:
            values = re.match("^\s+((?:[0-9.E+-]+\s*)*)", line)
            if values:
                Arrays[key].append(values.groups()[0].split())

Arrays = {key: np.array(Arrays[key]) for key in Arrays}
wnnmaw
  • 5,444
  • 3
  • 38
  • 63
  • 2
    This code will have quadratic runtime, so it's not really advisable to do it this way. – Sven Marnach Apr 25 '16 at 18:42
  • Is there a specific reason to use a dictionary over linked objects. i.e. `b=a; b.append(1)` when using python lists? – Ben Apr 25 '16 at 18:50
  • @hansatz, I find the dictionary to be more readable, clearer code. There are plenty of folks out there who don't understand the implications of `a = b` when both are mutable, so I would suggest avoiding a method which relies on that behavior – wnnmaw Apr 25 '16 at 18:53
  • @hansatz, that and a dictionary is easily expandable to more than just two items – wnnmaw Apr 25 '16 at 18:54
0

So the simple switch over to list append could be written as:

x, y = [], []
with open("./data.txt", "r") as f:
    for line in f:
        if re.match('x values', line):
            print "reading x values"
            alist = x
        elif re.match('y', line):
            print "reading y values"
            alist = y
        else:
            values = re.match("^\s+((?:[0-9.E+-]+\s*)*)", line)
            if values:
                alist.append(values.groups()[0].split())

Now both x and y will be lists of lists. If the sublists are all the same size you could do

x_array = np.array(x)

to get a 2d array. But if the sublists differ in size this will produce a 1d array of dtype=object, which is little more than a list with array overhead. For example:

In [98]: np.concatenate([[1,2,3],[1,2]])
Out[98]: array([1, 2, 3, 1, 2])

In [99]: np.array([[1,2,3],[1,2]])
Out[99]: array([[1, 2, 3], [1, 2]], dtype=object)

In [100]: np.array([[1,2,3],[1,2,4]])
Out[100]: 
array([[1, 2, 3],
       [1, 2, 4]])

I don't expect much time difference between using these 2 global variables, and the dictionary of lists {"x": [], "y": []} approach. Global variables are kept in a dictionary as well.

The real issue is whether you collect the intermediate values in lists or arrays.

hpaulj
  • 221,503
  • 14
  • 230
  • 353
-1

Have a look at numpy.hstack

http://docs.scipy.org/doc/numpy-1.10.1/reference/generated/numpy.hstack.html

import numpy as np
a = np.arange(0, 10, 1)
b = np.array([5])
np.hstack((a,b))

Returns array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 5])

some_weired_user
  • 556
  • 1
  • 5
  • 15