185

Suppose I have a numpy array:

data = np.array([[1,1,1],[2,2,2],[3,3,3]])

and I have a corresponding "vector:"

vector = np.array([1,2,3])

How do I operate on data along each row to either subtract or divide so the result is:

sub_result = [[0,0,0], [0,0,0], [0,0,0]]
div_result = [[1,1,1], [1,1,1], [1,1,1]]

Long story short: How do I perform an operation on each row of a 2D array with a 1D array of scalars that correspond to each row?

BFTM
  • 3,225
  • 6
  • 23
  • 22
  • if I have a 3x1000 array and try to divide by a 3-elt array, I get an error. Why doesn't this just work? – eric Dec 16 '21 at 18:08

6 Answers6

260

Here you go. You just need to use None (or alternatively np.newaxis) combined with broadcasting:

In [6]: data - vector[:,None]
Out[6]:
array([[0, 0, 0],
       [0, 0, 0],
       [0, 0, 0]])

In [7]: data / vector[:,None]
Out[7]:
array([[1, 1, 1],
       [1, 1, 1],
       [1, 1, 1]])
JoshAdel
  • 66,734
  • 27
  • 141
  • 140
  • 17
    [here](http://docs.scipy.org/doc/numpy/reference/arrays.indexing.html#numpy.newaxis) is the doc. – sazary Apr 08 '15 at 04:10
  • 4
    a [visual example](https://www.w3resource.com/python-exercises/numpy/python-numpy-exercise-96.php) – PlsWork May 06 '19 at 12:33
  • @user108569 using the latest version of numpy (1.18.1), `None` still works equivalently to `np.newaxis`. I'm not sure what your setup is, or the exact issue you are experiencing, but the answer is still valid. – JoshAdel Feb 19 '20 at 17:40
  • Using reshape as suggested by another answer might be better because jax also has a reshape function. I believe it doesn't have numpy-like slicing though. – Reza Roboubi Apr 21 '23 at 05:12
19

As has been mentioned, slicing with None or with np.newaxes is a great way to do this. Another alternative is to use transposes and broadcasting, as in

(data.T - vector).T

and

(data.T / vector).T

For higher dimensional arrays you may want to use the swapaxes method of NumPy arrays or the NumPy rollaxis function. There really are a lot of ways to do this.

For a fuller explanation of broadcasting, see http://docs.scipy.org/doc/numpy/user/basics.broadcasting.html

IanH
  • 10,250
  • 1
  • 28
  • 32
9

Pythonic way to do this is ...

np.divide(data.T,vector).T

This takes care of reshaping and also the results are in floating point format. In other answers results are in rounded integer format.

#NOTE: No of columns in both data and vector should match

shantanu pathak
  • 2,018
  • 19
  • 26
  • Note: This does not do what the OP is requesting. The end result is array([[1., 0.5, 0.33333333], [2., 1., 0.66666667], [3. , 1.5, 1. ]]). It might be 'Pythonic' but it's incorrect. – Mark Cramer Nov 21 '19 at 16:06
  • 1
    @MarkCramer Thank you. I have corrected my answer to provide the right result. – shantanu pathak Nov 21 '19 at 16:14
  • This is correct but no longer pythonic. It is bizarre how numpy is so often elegant but then with things like this it is like...huh? – eric Dec 16 '21 at 18:09
9

Adding to the answer of stackoverflowuser2010, in the general case you can just use

data = np.array([[1,1,1],[2,2,2],[3,3,3]])

vector = np.array([1,2,3])

data / vector.reshape(-1,1)

This will turn your vector into a column matrix/vector. Allowing you to do the elementwise operations as you wish. At least to me, this is the most intuitive way going about it and since (in most cases) numpy will just use a view of the same internal memory for the reshaping it's efficient too.

meow
  • 2,062
  • 2
  • 17
  • 27
  • 3
    This should be the accepted answer. Creating a column vector with `.reshape(-1,1)` is the most intuitive way to use broadcasting. – Paul Rougieux Apr 25 '20 at 07:05
  • It should be the accepted answer because jax also has a reshape function. I believe it doesn't have numpy-like slicing though. – Reza Roboubi Apr 21 '23 at 05:10
4

JoshAdel's solution uses np.newaxis to add a dimension. An alternative is to use reshape() to align the dimensions in preparation for broadcasting.

data = np.array([[1,1,1],[2,2,2],[3,3,3]])
vector = np.array([1,2,3])

data
# array([[1, 1, 1],
#        [2, 2, 2],
#        [3, 3, 3]])
vector
# array([1, 2, 3])

data.shape
# (3, 3)
vector.shape
# (3,)

data / vector.reshape((3,1))
# array([[1, 1, 1],
#        [1, 1, 1],
#        [1, 1, 1]])

Performing the reshape() allows the dimensions to line up for broadcasting:

data:            3 x 3
vector:              3
vector reshaped: 3 x 1

Note that data/vector is ok, but it doesn't get you the answer that you want. It divides each column of array (instead of each row) by each corresponding element of vector. It's what you would get if you explicitly reshaped vector to be 1x3 instead of 3x1.

data / vector
# array([[1, 0, 0],
#        [2, 1, 0],
#        [3, 1, 1]])
data / vector.reshape((1,3))
# array([[1, 0, 0],
#        [2, 1, 0],
#        [3, 1, 1]])
stackoverflowuser2010
  • 38,621
  • 48
  • 169
  • 217
3

The key is to reshape the vector of size (3,) to (3,1): divide each row by an element or (1,3): divide each column by an element. As data.shape does not correspond to vector.shape, NumPy automatically expands vector's shape to (3,3) and performs division, element-wise.

In[1]: data/vector.reshape(-1,1)
Out[1]:
array([[1., 1., 1.],
       [1., 1., 1.],
       [1., 1., 1.]])

In[2]: data/vector.reshape(1,-1)
Out[2]:
array([[1.        , 0.5       , 0.33333333],
       [2.        , 1.        , 0.66666667],
       [3.        , 1.5       , 1.        ]])

Similar:

x = np.arange(9).reshape(3,3)
x
array([[0, 1, 2],
       [3, 4, 5],
       [6, 7, 8]])

x/np.sum(x, axis=0, keepdims=True)
array([[0.        , 0.08333333, 0.13333333],
       [0.33333333, 0.33333333, 0.33333333],
       [0.66666667, 0.58333333, 0.53333333]])

x/np.sum(x, axis=1, keepdims=True)
array([[0.        , 0.33333333, 0.66666667],
       [0.25      , 0.33333333, 0.41666667],
       [0.28571429, 0.33333333, 0.38095238]])

print(np.sum(x, axis=0).shape)
print(np.sum(x, axis=1).shape)
print(np.sum(x, axis=0, keepdims=True).shape)
print(np.sum(x, axis=1, keepdims=True).shape)
(3,)
(3,)
(1, 3)
(3, 1)