1

I have a numpy array cols2:

print(type(cols2))
print(cols2.shape)
<class 'numpy.ndarray'>
(97, 2)

I was trying to get the first column of this 2d numpy array using the first code below, then i got a vector instead of my ideal one column of data. the second code seem to get me the ideal answer, but i am confused what does the second code is doing by adding a bracket outside the zero?

print(type(cols2[:,0]))
print(cols2[:,0].shape)
<class 'numpy.ndarray'>
(97,)

print(type(cols2[:,[0]]))
print(cols2[:,[0]].shape)
<class 'numpy.ndarray'>
(97, 1)
cs95
  • 379,657
  • 97
  • 704
  • 746
Pumpkin C
  • 1,452
  • 6
  • 21
  • 27

2 Answers2

3

cols2[:, 0] specifies that you want to slice out a 1D vector of length 97 from a 2D array. cols2[:, [0]] specifies that you want to slice out a 2D sub-array of shape (97, 1) from the 2D array. The square brackets [] make all the difference here.

v = np.arange(6).reshape(-1, 2)

v[:, 0]
array([0, 2, 4])

v[:, [0]]
array([[0],
       [2],
       [4]])

The fundamental difference is the extra dimension in the latter command (as you've noted). This is intended behaviour, as implemented in numpy.ndarray.__get/setitem__ and codified in the NumPy documentation.

You can also specify cols2[:,0:1] to the same effect - a column sub-slice.

v[:, 0:1]
array([[0],
       [2],
       [4]])

For more information, look at the notes on Advanced Indexing in the NumPy docs.

cs95
  • 379,657
  • 97
  • 704
  • 746
0

The extra square brackets around 0 in cols2[:, [0]] adds an extra dimension.

This becomes more clear when you print the results of your code:

A = np.array([[1, 2],
              [3, 4],
              [5, 6]])

A.shape        # (3, 2)
A[:, 0].shape  # (3,)
A[:, 0]        # array([1, 3, 5])

A[:, [0]]

# array([[1],
#        [3],
#        [5]])

An n-D numpy array can only use n integers to represent its shape. Therefore, a 1D array is represented by only a single integer. There is no concept of "rows" or "columns" of a 1D array.

You should resist the urge to think of numpy arrays as having rows and columns, but instead consider them as having dimensions and shape. This is a fundamental difference between numpy.array and numpy.matrix. In almost all cases, numpy.array is sufficient.

jpp
  • 159,742
  • 34
  • 281
  • 339