Data.table in R into datatable in Python translation problem (normalization)

Question

I have this code in R where I am using data.table, and I have the intention to translate it into Python with datatable. It creates columns with the value of each existing column divided by the mean of the total. Kind of normalization.

dataset[ , paste0( cols, suffix) := lapply( .SD,  function(x){ x/mean(x, na.rm=TRUE)} ), 
         by= col_A, 
         .SDcols= cols]

You might also find this useful https://datatable.readthedocs.io/en/latest/manual/comparison_with_rdatatable.html#modify-several-columns-and-keep-others-unchanged — langtang, Feb 05 '22 at 19:13
this page may be useful for column assignment : https://datatable.readthedocs.io/en/latest/manual/transform_data.html#column-assignment — sammywemmy, Feb 11 '22 at 07:37

score 2 · Answer 1 · answered Feb 05 '22 at 18:34

According to the documentation, update function and del operator operate in-place. This may be done in a loop also if there are many columns

DT[:, update(y_suffix = f.y/dt.mean(f.y), v_suffix = f.v/dt.mean(f.v)), by("x")]

-output

data

from datatable import dt, f, g, by, update

DT = dt.Frame(x = ["b"]*3 + ["a"]*3 + ["c"]*3,
              y = [1, 3, 6] * 3,
              v = range(1, 10))

langtang · Accepted Answer · 2022-02-06T00:20:40.380

1

from datatable import f,by,update,dt

dataset=dt.Frame({'col_A':[0,0,1,1], 'col_B':[1,2,3,4], 'col_C':[5,6,7,8]})
cols = dataset[:,[int,float]].names
dataset[:, update(**{col+'_norm': f[col]/dt.mean(f[col]) for col in cols if col!='col_A'}), by(f.col_A)]

edited Feb 06 '22 at 00:20

answered Feb 05 '22 at 18:59

langtang

22,248
1
12
27

Data.table in R into datatable in Python translation problem (normalization)

2 Answers2

data