0

I have 4 arrays. Array X: is 2D array that contain examples (each has 3 features):

X = array([[1, 2, 3], [4, 5, 6], [7, 8, 9], [10, 11, 12], [13, 14, 15], [16, 17, 18], [19, 20, 21]])

Array Y contains labels for examples in Array X:

Y = array([11, 44, 77, 22, 77, 22, 22])

Arrays L & R contain subsets of the labels

L = array([11, 44])
R = array([77, 22])

I want to slice both X and Y according to the labels in L and R. So the output should be:

XL = array([[1, 2, 3], [4, 5, 6]])
XR = array([[7, 8, 9], [10, 11, 12], [13, 14, 15], [16, 17, 18], [19, 20, 21]])
YL = array([11, 44])
YR = array([77, 22, 77, 22, 22])

I know I can do something like the following to extract the rows I want when based on value:

Y[Y==i]
X[Y[Y==i], :] 

However, i here is a value, but in my question it is another array (e.g., L and R). I want an efficient solution in python 3 to do that. Any hints?

sareem
  • 429
  • 1
  • 8
  • 23
  • what sort of label is `44` and `77`? Do you mean to use it as `X[3, 3]` and `X[6, 6]`? – Ma0 Oct 22 '18 at 07:43
  • This is not what is called slicing. Its boolean indexing, quite different... Unless your labels are organized in a very particular way, you just can't use mere slicing. – Julien Oct 22 '18 at 07:44
  • @Ev. Kounis these are values correspond to some sort of classes. There might be another labels like 55, 66, 99, etc. – sareem Oct 22 '18 at 07:47
  • @Julien I see! so is there a way to do what I want using the boolean indexing? – sareem Oct 22 '18 at 07:49
  • Yes and it looks like what you just did, if you are not happy with your code, you need to share it and explain what exactly you are unhappy with... – Julien Oct 22 '18 at 07:50
  • I didn't do it! I need a way to do it. – sareem Oct 22 '18 at 07:51
  • `for i in L: X[Y[Y==i]]` + some `np.vstack, hstack etc`... – Julien Oct 22 '18 at 07:54
  • @Julien this will not work since the values of L are not indices: IndexError: index 11 is out of bounds for axis 0 with size 7 – sareem Oct 22 '18 at 08:01
  • Just copied your own typo: use `X[Y==i]`... – Julien Oct 22 '18 at 08:02

2 Answers2

1

That's how you normally do:

from numpy import array

X = array([[1, 2, 3], [4, 5, 6], [7, 8, 9], [10, 11, 12], [13, 14, 15], [16, 17, 18], [19, 20, 21]])
Y = array([11, 44, 77, 22, 77, 22, 22])

L = array([11, 44])
R = array([77, 22])

XL = array([x for x, y in zip(X, Y) if y in L])
XR = array([x for x, y in zip(X, Y) if y in R])
YL = array([y for y in Y if y in L])
YR = array([y for y in Y if y in R])

# Output
# XL = array([[1, 2, 3], [4, 5, 6]])
# XR = array([[7, 8, 9], [10, 11, 12], [13, 14, 15], [16, 17, 18], [19, 20, 21]])
# YL = array([11, 44])
# YR = array([77, 22, 77, 22, 22])

Hope this helps :)

gripep
  • 379
  • 3
  • 13
  • while this is "clean and convenient" it is probably quite slow on big data, since it's looping in python heavily instead of using numpy vectorization... – Julien Oct 22 '18 at 08:15
1

Using np.isin:

import numpy as np

X = np.asarray([[1, 2, 3], [4, 5, 6], [7, 8, 9], [10, 11, 12], [13, 14, 15], [16, 17, 18], [19, 20, 21]])
Y = np.asarray([11, 44, 77, 22, 77, 22, 22])

L = np.asarray([11, 44])
R = np.asarray([77, 22])

mask_L = np.isin(Y, L)
mask_R = np.isin(Y, R)

print(X[mask_L,:])  # output: array([[1, 2, 3], [4, 5, 6]])

print(X[mask_R,:])  # output: array([[ 7,  8,  9], [10, 11, 12], 13, 14, 15], 16, 17, 18], 19, 20, 21]])

print(Y[mask_L])  # output: array([11, 44])

print(Y[mask_R])  # output: array([77, 22, 77, 22, 22])
cheersmate
  • 2,385
  • 4
  • 19
  • 32