2

I have a datatable Frame created as:

comidas_gen_dt = dt.Frame({
    'country':list('ABCDE'),
    'id':[1,2,3,4,5],
    'egg':[10,20,30,5,40],
    'veg':[30,40,10,3,5],
    'fork':[5,10,2,1,9],
    'beef':[90,50,20,None,4]})

I have created a custom function to select a list of required columns from a frame DT as,

def pydt_select_cols(DT, *rmcols):
    return DT[:, *dt_cols]

So, here is the recommend syntax to remove columns from DT:

DT[:, f[:].remove([f.a, f.b, f.c])

following the above syntax of DT, I've create another custom function to keep a side a list of columns as

def pydt_remove_cols(DT, *rmcols):
    dt_cols = [*rmcols]
    return DT[:, f[:].remove(dt_cols)]

I'm executing the function as

pydt_remove_cols(comidas_gen_dt, 'id', 'country', 'egg')

and it's throwing the error

TypeError: Computed columns cannot be used in .remove()

Could you please help me how to go ahead with it?

Pasha
  • 6,298
  • 2
  • 22
  • 34
myamulla_ciencia
  • 1,282
  • 1
  • 8
  • 30

1 Answers1

3

Removing columns (or rows) from a Frame is easy: take any syntax that you would normally use to select those columns, and then append the python del keyword.

Thus, if you want to delete columns 'id', 'country', and 'egg', run

>>> del comidas_gen_dt[:, ['id','country','egg']]
>>> comidas_gen_dt
   | veg  fork  beef
-- + ---  ----  ----
 0 |  30     5    90
 1 |  40    10    50
 2 |  10     2    20
 3 |   3     1    NA
 4 |   5     9     4

[5 rows x 3 columns]

If you want to keep the original frame unmodified, and then select a new frame with some of the columns removed, then the easiest way would be to first copy the frame, and then use the del operation:

>>> DT = comidas_gen_dt.copy()
>>> del DT[:, columns_to_remove]

(note that .copy() makes a shallow copy, i.e. its cost is typically negligible).

You can also use the f[:].remove() approach. It's a bit strange that it didn't work the way you've written it, but going from a list of strings to a list of f-symbols is quite straightforward:

def pydt_remove_cols(DT, *rmcols):
    return DT[:, f[:].remove([f[col] for col in rmcols])]

Here I use the fact that f.A is the same as f["A"], where the inner string "A" might as well be replaced with any variable.

Pasha
  • 6,298
  • 2
  • 22
  • 34
  • Yes I’m aware of first approach, I have got a full clarification on using .remove on multiple columns. Thanks again for your quick responses – myamulla_ciencia May 22 '20 at 17:48