1

I'm trying to find an efficient way to explicitly exclude a list of columns from a numpy array. I'm aware that in R by placing the "minus" sing before c() you can specify what columns not to include from a data frame

I have tried using "~" before the list of index of columns I do not want to include with no results

For example I'm going to generate an array with sklearn

X,_ = make_blobs(n_samples= size, n_features= 12, centers= 2,
 cluster_std= 10, random_state= 2)

I want somehow to indicate that I want to keep all the columns except for the 10,3 and 9 for X

How Can I achieve that without doing this:

X[:, [6,7,4,2,0,1,11,8,5]]
  • There is a `np.delete` function. But under the covers it will construct an indexing list (or boolean mask) like what you have. And it can't reorder the columns as your example does. What you show is the most efficient way - there's no way around making a copy of a large portion of `X`. Constructing the indexing list, however you do, is a small time step. – hpaulj Jun 26 '19 at 19:27
  • @hpaulj I know that for my example it is not a big deal to specify the columns to keep but I'm think on an example in which I have an array with more than 1000 columns from which I want only to keep 150. Would it be necessary to explicitly enumerate the 850 columns to be kept? –  Jun 26 '19 at 22:57
  • An alternative to enumerating the columns is to use a boolean mask. Start with `np.ones(1000, bool)` and set 150 of them to `False`. You can then index with that, or use `np.where(mask)` to get the equivalent enumerated columns. – hpaulj Jun 26 '19 at 23:19

0 Answers0