2

I was reading about python slicing but I didn't able to figure this out.

clf.predict_proba(X_test)[:,1]

Then I tried to test myself with simple list.

a = [2,4,6,7,7,8]

>>> a[:,1]
-----> TypeError: list indices must be integers or slices, not tuple

Celius Stingher
  • 17,835
  • 6
  • 23
  • 53
martian_rover
  • 339
  • 6
  • 15

3 Answers3

3

Hy! @martin a[:,1:] is used to slice 2-dimensional NumPy array for example.

a = [[1,2,3,4,5], [6,5,3,2,6]]

represent as

a = [[1, [ 6,
      2,   5,
      3,   3,
      4,   2,
      5    6
      ],     ]] 

than a[:,1] == a[col[start] : col[end], row[start] : row[end]]

will be [[2,5]] means take both column and row at 1st index.

Sohaib Anwaar
  • 1,517
  • 1
  • 12
  • 29
  • 2
    Careful @martian_rover ! The definition you are using is not a numpy array but a nested list. Using the slicing you provide with your example will return `IndexError` – Celius Stingher Dec 11 '20 at 13:45
1

There is a big difference when working with numpy arrays as is the result of clf.predict_proba(X_test)[:,1] and a list:

As it has been mentioned in the comments, lists can be sliced with single values, not comma separated because of their structure, whereas a numpy array with might be n-dimensional, can be sliced within with the comma separated value to indicate number of rows and columns, just like pd.DataFrame.iloc[] does.

np.array([1,1],[2,3],[4,3])

ex_list = [[1,1],[2,3],[4,3]]

But how does this actually look like? Well in the case of lists, they are 1-dimensial or flat whereas this array is not.

1 arr_example Has 3 rows and 2 columns:

array([[1, 1],
       [2, 3],
       [4, 3]])  

2 ex_list:

[[1,1],[2,3],[4,3]]

If you want to access the inner value of the nested list, then the indexing must be done outside the first slicer as you can see in the example below:

arr_example[:1,0] # arr_example[rows,columns]
list_example[:1][0][0]

In this case in arr_example we are selecting the rows from start up to,but not including 1 (position 1, therefore only the first row) and the first column (position 0). Looking at the the structure of the data and understanding how the slicing works, the following outputs make sense:

array([1])
1
Celius Stingher
  • 17,835
  • 6
  • 23
  • 53
  • So you are saying if I understood you, that numpy array n-dimensional slicing and pandas dataframe slicing using loc and iloc is done in same way [:,1] – martian_rover Dec 11 '20 at 13:17
  • In the way you that the first value indicate the number of rows and the second one the number of columns, yes. I added a throughout explanation. – Celius Stingher Dec 11 '20 at 13:18
0

x[:,1] is translated by the interpreter into

x.__getitem__((slice(None), 1))

That is, it calls the __getitem__ method of the x object, passing it (in this case) a tuple argument. The : is translated into a slice object. It's the comma in the indexing that creates a tuple.

If x is a numpy array, indexing with a tuple makes sense (subject to its own rules). But as your error indicates, indexing with a tuple does not work for a list. The error says what's allowed.

So while python syntax allows this kind of indexing in general, the details are class dependent.

For a 2d array, [:,1] means select the 2nd column.

hpaulj
  • 221,503
  • 14
  • 230
  • 353