What's the most efficient way to split up a Numpy ndarray using percentage?

Question

Hi I'm new to Python & Numpy and I'd like to ask what is the most efficient way to split a ndarray into 3 parts: 20%, 60% and 20%

    import numpy as np
    row_indices = np.random.permutation(10)

Let's assume the ndarray has 10 items: [7 9 3 1 2 4 5 6 0 8] The expected results are the ndarray separated into 3 parts like part1, part2 and part3.
part1: [7 9]
part2: [3 1 2 4 5]
part3: [0 8]

score 2 · Accepted Answer · answered Apr 13 '19 at 06:33

Here's one way -

# data array
In [85]: a = np.array([7, 9, 3, 1, 2, 4, 5, 6, 0, 8])

# percentages (ratios) array
In [86]: p = np.array([0.2,0.6,0.2]) # must sum upto 1

In [87]: np.split(a,(len(a)*p[:-1].cumsum()).astype(int))
Out[87]: [array([7, 9]), array([3, 1, 2, 4, 5, 6]), array([0, 8])]

Alternative to np.split :

np.split could be slower when working with large data, so, we could alternatively use a loop there -

split_idx = np.r_[0,(len(a)*p.cumsum()).astype(int)]
out = [a[i:j] for (i,j) in zip(split_idx[:-1],split_idx[1:])]

score 2 · Answer 2 · answered Apr 13 '19 at 07:45

I normally just go for the most obvious solution, although there are much fancier ways to do the same. It takes a second to implement and doesn't even require debugging (since it's extremely simple)

part1 = [a[i, ...] for i in range(int(a.shape[0] * 0.2))]
part2 = [a[i, ...] for i in range(int(a.shape[0] * 0.2), int(len(a) * 0.6))]
part3 = [a[i, ...] for i in range(int(a.shape[0] * 0.6), len(a))]

A few things to notice though

This is rounded and therefore you could get something which is only roughly a 20-60-20 split
You get back a list of element so you might have to re-numpyfy them with np.asarray()
You can use this method for indexing multiple objects (e.g. labels and inputs) for the same elements
If you get the indices once before the splits (indices = list(range(a.shape[0]))) you could also shuffle them thus taking care of data shuffling at the same time

What's the most efficient way to split up a Numpy ndarray using percentage?

2 Answers2