1

i have a 2D numpy array. I'm trying to compute the similarities between rows and put it into a similarities array. Is this possible without loop? Thanks for your time!

# ratings.shape = (943, 1682)

arri = np.zeros(943)
arri = np.where(arri == 0)[0]

arrj = np.zeros(943)
arrj = np.where(arrj ==0)[0]

similarities = np.zeros((ratings.shape[0], ratings.shape[0]))

similarities[arri, arrj] = np.abs(ratings[arri]-ratings[arrj])

I want to make a 2D-array similarities in that similarities[i, j] is the differentiation between row i and row j in ratings

[ValueError: shape mismatch: value array of shape (943,1682) could not be broadcast to indexing result of shape (943,)] [1][1]: https://i.stack.imgur.com/gtst9.png

diepitus
  • 13
  • 3
  • 1
    I want to make a 2D-array ```similarities``` in that similarities[i, j] is the differentiation between row i and row j in ```ratings```. – diepitus Jun 08 '21 at 16:59

1 Answers1

0

The problem is how numpy iterates through the array when indexing a two-dimentional array with two arrays.


First some setup:

import numpy;

ratings = numpy.arange(1, 6)

indicesX = numpy.indices((ratings.shape[0],1))[0]
indicesY = numpy.indices((ratings.shape[0],1))[0]

ratings: [1 2 3 4 5]

indicesX: [[0][1][2][3][4]]

indicesY: [[0][1][2][3][4]]


Now lets see what your program produces:

similarities = numpy.zeros((ratings.shape[0], ratings.shape[0]))
similarities[indicesX, indicesY] = numpy.abs(ratings[indicesX]-ratings[0])

similarities:

[[0. 0. 0. 0. 0.]
 [0. 1. 0. 0. 0.]
 [0. 0. 2. 0. 0.]
 [0. 0. 0. 3. 0.]
 [0. 0. 0. 0. 4.]]

As you can see, numpy iterates over similarities basically like the following:

for i in range(5):
    similarities[indicesX[i], indicesY[i]] = numpy.abs(ratings[i]-ratings[0])

similarities:

[[0. 0. 0. 0. 0.]
 [0. 1. 0. 0. 0.]
 [0. 0. 2. 0. 0.]
 [0. 0. 0. 3. 0.]
 [0. 0. 0. 0. 4.]]

Now instead we need indices like the following to iterate through the entire array:

indecesX = [0,1,2,3,4,0,1,2,3,4,0,1,2,3,4,0,1,2,3,4,0,1,2,3,4]
indecesY = [0,0,0,0,0,1,1,1,1,1,2,2,2,2,2,3,3,3,3,3,4,4,4,4,4]

We do that the following:

# Reshape indicesX from (x,1) to (x,). Thats important for numpy.tile().
indicesX = indicesX.reshape(indicesX.shape[0])
indicesX = numpy.tile(indicesX, ratings.shape[0])

indicesY = numpy.repeat(indicesY, ratings.shape[0])

indicesX: [0 1 2 3 4 0 1 2 3 4 0 1 2 3 4 0 1 2 3 4 0 1 2 3 4]

indicesY: [0 0 0 0 0 1 1 1 1 1 2 2 2 2 2 3 3 3 3 3 4 4 4 4 4]

Perfect! Now just call similarities[indicesX, indicesY] = numpy.abs(ratings[indicesX]-ratings[indicesY]) again and we see:

similarities:

[[0. 1. 2. 3. 4.]
 [1. 0. 1. 2. 3.]
 [2. 1. 0. 1. 2.]
 [3. 2. 1. 0. 1.]
 [4. 3. 2. 1. 0.]]

Here the whole code again:

import numpy;

ratings = numpy.arange(1, 6)

indicesX = numpy.indices((ratings.shape[0],1))[0]
indicesY = numpy.indices((ratings.shape[0],1))[0]

similarities = numpy.zeros((ratings.shape[0], ratings.shape[0]))

indicesX = indicesX.reshape(indicesX.shape[0])
indicesX = numpy.tile(indicesX, ratings.shape[0])

indicesY = numpy.repeat(indicesY, ratings.shape[0])

similarities[indicesX, indicesY] = numpy.abs(ratings[indicesX]-ratings[indicesY])
print(similarities)

PS

You commented on your own post to improve it. You should edit your question instead of commenting on it, when you want to improve it.

Habetuz
  • 111
  • 1
  • 6