2

Consider the following data:

61  1  1 15.04 14.96 13.17  9.29 13.96  9.87 13.67 10.25 10.83 12.58 18.50 15.04
61  1  2 14.71 16.88 10.83  6.50 12.62  7.67 11.50 10.04  9.79  9.67 17.54 13.83
61  1  3 18.50 16.88 12.33 10.13 11.17  6.17 11.25  8.04  8.50  7.67 12.75 12.71

The first three columns are year, month and day.
The remaining 12 columns are average windspeeds in knots at 12 locations in a country on that day.

What I want to do is lose the 2nd and 3rd column (index 1 and 2) so that I get the following data:

61  15.04 14.96 13.17  9.29 13.96  9.87 13.67 10.25 10.83 12.58 18.50 15.04
61  14.71 16.88 10.83  6.50 12.62  7.67 11.50 10.04  9.79  9.67 17.54 13.83
61  18.50 16.88 12.33 10.13 11.17  6.17 11.25  8.04  8.50  7.67 12.75 12.71

The following works but I dont like it as it wont scale if I had lots of columns (ie many locations) in the data.

import numpy as np
data = np.loadtxt('wind.data')
data_nomonth_noday = data[:,[0,3,4,5,6,7,8,9,10,11,12,13,14]]

Is it possible to achieve it without enumerating the column numbers ? Can I achieve it with slicing ?

2020
  • 2,821
  • 2
  • 23
  • 40

4 Answers4

2

You can easily generate the indexing array with r_.

In [165]: np.r_[0,3:15]                                                                  
Out[165]: array([ 0,  3,  4,  5,  6,  7,  8,  9, 10, 11, 12, 13, 14])

under the covers it's just doing

In [166]: np.concatenate([[0],np.arange(3,15)])                                          
Out[166]: array([ 0,  3,  4,  5,  6,  7,  8,  9, 10, 11, 12, 13, 14])

np.delete, while convenient, ends up with a similar amount of work. Depending on the deletion index it will either concatenate pieces, or construct a selection mask.

Regardless of the method, the result is a new array, with a copy of the required data (not a view).

loadtxt accepts as usecols parameter that takes a similar column index array.

hpaulj
  • 221,503
  • 14
  • 230
  • 353
  • Yes, `data_nomonth_noday = data[:,np.r_[0,3:15]]` does it. – 2020 Dec 26 '19 at 01:20
  • Considering the data is huge, it will be great if I can just get a view without those 2 columns, instead of creating a copy of the data without those 2 columns. Is it possible to achieve this without creating a new copy of the data ? – 2020 Dec 26 '19 at 03:15
  • 1
    Do you think we are suggesting the copy solution just to be ornery!! A view is possible only if the original data can be accessed in a regular pattern, using just `shape` and `strides`. `[1,2]` is such a pattern, `[0, 3, 4,...]` is not. There's a hole in the middle than can't be described with the simple `slice` syntax. – hpaulj Dec 26 '19 at 04:08
  • 1
    Someone else suggest that you extract two arrays, the `data[:,0]` column, and the `data[:,3:]` columns. Since that first column appears to be integers (though it will be floats after `loadtxt`), it probably isn't being used in the same way as the float columns. What's the reason for keeping it in the same array? Those two arrays will be views. – hpaulj Dec 26 '19 at 04:13
  • Thanks for the information regd when a view is possible and when it is not. I didnt know that till now. – 2020 Dec 26 '19 at 16:03
1

You can use np.delete [numpy-doc] for that, and use a slice object as parameter to remove:

>>> np.delete(data, slice(1, 3), 1)
array([[61.  , 15.04, 14.96, 13.17,  9.29, 13.96,  9.87, 13.67, 10.25,
        10.83, 12.58, 18.5 , 15.04],
       [61.  , 14.71, 16.88, 10.83,  6.5 , 12.62,  7.67, 11.5 , 10.04,
         9.79,  9.67, 17.54, 13.83],
       [61.  , 18.5 , 16.88, 12.33, 10.13, 11.17,  6.17, 11.25,  8.04,
         8.5 ,  7.67, 12.75, 12.71]])

When you use slicing notation, under the hood you basically pass a slice object. Indeed a[1:3] is equivalent to a[slice(1,3)].

Furthermore the 1 here specifies the dimension over which we want to remove. Since we wish to remove data for the second dimension, we thus write 1 as third parameter.

Willem Van Onsem
  • 443,496
  • 30
  • 428
  • 555
1

This should work:

import numpy as np
data = np.loadtxt('wind.data')
data_nomonth_noday = np.zeros((data.shape[0],data.shape[1]-2))
data_nomonth_noday[:,0] = data[:,0]
data_nomonth_noday[:,1:] = data[:,3:]

In my opinion this is more readable,flexible and intuitive than some of the other possible ways of doing this

0

If a is your numpy array and you want to drop the columns: 1,2, you could do that using the following in a single line.

import numpy as np

delete_cols = [1,2] # list of column numbers to delete
a[:,list(set(np.arange(a.shape[-1])) - set(delete_cols))]

Some Explanation

What you need here is proper indexing of the array a.

# list_of_column_numbers = [0, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14]
a[:, list_of_column_numbers]

You can make the list_of_column_numbers in one of the following ways:

# Method-1: Direct Declaration
list_of_column_numbers = [0, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14]

# Method-2A: Using Set and Dropping Columns not Needed
# a.shape[-1] = 15
delete_cols = [1,2] # list of column numbers to delete
list_of_column_numbers = list(set(np.arange(a.shape[-1])) - set(delete_cols))

# Method-2B: Make list of column numbers
# a.shape[-1] = 15
list_of_column_numbers = [0] + np.arange(3,a.shape[-1]).tolist()
CypherX
  • 7,019
  • 3
  • 25
  • 37