9

I'm trying to create a pivot table from a Numpy array in python. I've done a lot of research but I cannot find a straight forward solution. I know you can do it with Pandas but I'm having trouble installing it - but there must be a way of doing it without Pandas. My Numpy array is

[[ 4057     8  1374]
 [ 4057     9   759]
 [ 4057    11    96]
 ..., 
 [89205    16   146]
 [89205    17   154]
 [89205    18   244]]

I need a pivot table where the rows are the first column, the columns are the second column and the values are the third column. Help please!

Thanks

user1893354
  • 5,778
  • 12
  • 46
  • 83

1 Answers1

17

I think this is what you want:

data = np.array([[ 4057,     8,  1374],
                 [ 4057,     9,   759],
                 [ 4057,    11,    96],
                 [89205,    16,   146],
                 [89205,    17,   154],
                 [89205,    18,   244]])

rows, row_pos = np.unique(data[:, 0], return_inverse=True)
cols, col_pos = np.unique(data[:, 1], return_inverse=True)

pivot_table = np.zeros((len(rows), len(cols)), dtype=data.dtype)
pivot_table[row_pos, col_pos] = data[:, 2]

>>> pivot_table
array([[1374,  759,   96,    0,    0,    0],
       [   0,    0,    0,  146,  154,  244]])
>>> rows
array([ 4057, 89205])
>>> cols
array([ 8,  9, 11, 16, 17, 18])

There are some limitations to this approach, the main being that, if you have repeated entries for a same row/column combination, they will not be added together, but only one (possibly the last) will be kept. If you want to add them all together, although a little convoluted, you could abuse scipy's sparse module:

data = np.array([[ 4057,     8,  1374],
                 [ 4057,     9,   759],
                 [ 4057,    11,    96],
                 [89205,    16,   146],
                 [89205,    17,   154],
                 [89205,    18,   244],
                 [ 4057,    11,     4]])

rows, row_pos = np.unique(data[:, 0], return_inverse=True)
cols, col_pos = np.unique(data[:, 1], return_inverse=True)

pivot_table = np.zeros((len(rows), len(cols)), dtype=data.dtype)
pivot_table[row_pos, col_pos] = data[:, 2]
>>> pivot_table # the element at [0, 2] should be 100!!!
array([[1374,  759,    4,    0,    0,    0],
       [   0,    0,    0,  146,  154,  244]])

import scipy.sparse as sps
pivot_table = sps.coo_matrix((data[:, 2], (row_pos, col_pos)),
                             shape=(len(rows), len(cols))).A
>>> pivot_table # now repeated elements are added together
array([[1374,  759,  100,    0,    0,    0],
       [   0,    0,    0,  146,  154,  244]])
Jaime
  • 65,696
  • 17
  • 124
  • 159
  • Fortunately my data is such that there shouldn't ever be a repeated entry like you mentioned – user1893354 Jun 10 '13 at 18:19
  • 2
    I am late to the party, but the thing to do in 2022 is just toss the data in a pandas dataframe and call the pivot method: https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.pivot.html – derchambers Apr 27 '22 at 15:20