How to get multi-dimension specific data samples on the basis of list element?

Question

I need to evaluate my model's performance with limited training data. I am randomly selecting p of original training data. Assume p is 0.2 in this case. Here is some intil lines of code:

p = p*100
data_samples = (data.shape[0] * p)/100  # data.shape= (100, 50, 50, 3)

# for randomly selecting data
import random
random.seed(1234)
filter_indices=[random.randrange(0, data.shape[0]) for _ in range(data_samples)]

Its giving me total filter indices randomly ranging between 0 and total data size.

Now, I want to get those samples of indices from the 'data' that are equivalent to filter_indices but include all dimensions. How can I do that effectively and effeciently?

You should be able to index `data` like: `out = data[filter_indices]`. Also note that you can use numpy's random module to streamline your code. Consider: `N = data.shape[0]; filter_indices = np.random.choice(N, size=int(N * p))`. — Chrysophylaxs, Apr 21 '23 at 16:43
@Chrysophylaxs thank you so much for your comment. Sorry for late reply. How can we use seed with `filter_indices = np.random.choice(N, size=int(N * p))`. — Ahmad, Apr 23 '23 at 04:09
@Chrysophylaxs I used `np.random.seed(0)` right before `filter_indices = np.random.choice(N, size=int(N * p))`. I believe it's working. Thank you so much for your help. If you want to post your comment in a formal answer, please go ahead. I will mark it correct. — Ahmad, Apr 23 '23 at 04:36

score 1 · Accepted Answer · answered Apr 23 '23 at 09:41

You can use numpy's integer array indexing to use your generated list of indices directly as index. When used on its own, the trailing dimensions will automatically be tacked on to the result! Smaller example:

import numpy as np

# Your data goes here
data = np.arange(90).reshape(10, 3, 3)

N = data.shape[0]
p = 0.2

# Generating random indices
n_samples = int(N * p)
np.random.seed(0)
filter_indices = np.random.choice(N, size=n_samples)

# Indexing magic:
out = data[filter_indices]

Note above that I've used numpy's built-in random module to streamline your code a little bit via np.random.choice.

Results:

>>> filter_indices
array([5, 0])
>>> out
array([[[45, 46, 47],
        [48, 49, 50],
        [51, 52, 53]],

       [[ 0,  1,  2],
        [ 3,  4,  5],
        [ 6,  7,  8]]])
>>> out.shape
(2, 3, 3)

out is exactly the 2 shape (3, 3) subarrays in data at indices 5 and 0. So the result has shape (2, 3, 3) instead of (10, 3, 3).

How to get multi-dimension specific data samples on the basis of list element?

1 Answers1