0

I have this code in R where I am using data.table, and I have the intention to translate it into Python with datatable. It creates columns with the value of each existing column divided by the mean of the total. Kind of normalization.

dataset[ , paste0( cols, suffix) := lapply( .SD,  function(x){ x/mean(x, na.rm=TRUE)} ), 
         by= col_A, 
         .SDcols= cols]
  • 1
    You might also find this useful https://datatable.readthedocs.io/en/latest/manual/comparison_with_rdatatable.html#modify-several-columns-and-keep-others-unchanged – langtang Feb 05 '22 at 19:13
  • this page may be useful for column assignment : https://datatable.readthedocs.io/en/latest/manual/transform_data.html#column-assignment – sammywemmy Feb 11 '22 at 07:37

2 Answers2

2

According to the documentation, update function and del operator operate in-place. This may be done in a loop also if there are many columns

DT[:, update(y_suffix = f.y/dt.mean(f.y), v_suffix = f.v/dt.mean(f.v)), by("x")]

-output

enter image description here

data

from datatable import dt, f, g, by, update

DT = dt.Frame(x = ["b"]*3 + ["a"]*3 + ["c"]*3,
              y = [1, 3, 6] * 3,
              v = range(1, 10))
akrun
  • 874,273
  • 37
  • 540
  • 662
1
from datatable import f,by,update,dt

dataset=dt.Frame({'col_A':[0,0,1,1], 'col_B':[1,2,3,4], 'col_C':[5,6,7,8]})
cols = dataset[:,[int,float]].names
dataset[:, update(**{col+'_norm': f[col]/dt.mean(f[col]) for col in cols if col!='col_A'}), by(f.col_A)]
langtang
  • 22,248
  • 1
  • 12
  • 27