You can do what you want, but just that you can do something doesn't mean it's a good idea. Any solution that requires eval()
is probably more complicated than it needs to be, and introduces great risks if you don't have complete control over the data going in.
Having said that, this script shows a naive approach without fancy expressions from a table, and the approach you suggest - which I strongly recommend you not use and figure out a better way to achieve what you need:
from io import StringIO
import re
import datatable as dt
csv1 = """A,B
1,2
3,4
5,6"""
csv2 = """NAME,EXPR
A_GREATER_THAN_B, A>B
A_GREATER_THAN_10, A>10
B_GREATER_THAN_5, B>5"""
def naive():
# naive approach
d = dt.fread(StringIO(csv1))
d['A_GREATER_THAN_B'] = d[:, dt.f.A > dt.f.B]
d['A_GREATER_THAN_10'] = d[:, dt.f.A > 10]
d['B_GREATER_THAN_5'] = d[:, dt.f.B > 5]
print(d)
def update_with_expressions(d, expressions):
for n in range(expressions.nrows):
col = expressions[n, :][0, 'NAME']
expr = re.sub('([A-Za-z]+)', r'dt.f.\1', expressions[n, :][0, 'EXPR'])
# here's hoping that expression is trustworthy...
d[col] = d[:, eval(expr)]
def fancy():
# fancy, risky approach
d = dt.fread(StringIO(csv1))
update_with_expressions(d, dt.fread(StringIO(csv2)))
print(d)
if __name__ == '__main__':
naive()
fancy()
Result (showing you get the same result from either approach):
| A B A_GREATER_THAN_B A_GREATER_THAN_10 B_GREATER_THAN_5
| int32 int32 bool8 bool8 bool8
-- + ----- ----- ---------------- ----------------- ----------------
0 | 1 2 0 0 0
1 | 3 4 0 0 0
2 | 5 6 0 0 1
[3 rows x 5 columns]
| A B A_GREATER_THAN_B A_GREATER_THAN_10 B_GREATER_THAN_5
| int32 int32 bool8 bool8 bool8
-- + ----- ----- ---------------- ----------------- ----------------
0 | 1 2 0 0 0
1 | 3 4 0 0 0
2 | 5 6 0 0 1
[3 rows x 5 columns]
Note: if someone knows of a nicer way to iterate over rows in a datatable.Frame
, please leave a comment, because I'm not a fan of this part:
for n in range(expressions.nrows):
col = expressions[n, :][0, 'NAME']
Note that StringIO
is only imported to have the .csv files in the code, you wouldn't need them.