I'm trying to find the average value by item type, and then set a variable y that takes the value:
- missing value if x = 0,
- y = x. Then round values that are too big or too small:
- if x > 0, x > mu, then set y = mu
- if x < 0, x < mu, then set y = mu
This is what I've tried, which doesn't produce the desired result below:
tab2:([]items:`a`b`a`a`b; x:-6 8 0 -3 5)
tabsum: select mu:avg x by items from tab2;
tab2: update y:x from tab2
tab2: update y:mu from tab2 where x > 0 and x > mu / get error after running above step
tab2: update y:mu from tab2 where x < 0 and x <= mu
Desired result:
items x mu y
a -6 -3.0 -3.0
b 8 6.5 6.5
a 0 -3.0 NaN
a -3 -3.0 -3.0
b 5 6.5 5
Related to this data:
i) Is NaN the appropriate type for missing values in kdb? (NA
is different to NaN
or NULL
in R for example). I'm guessing so based on what I've read so far.
ii) Is there more efficient code to get the mu column inside tab2
? Making another table and merging I'm guessing isn't efficient (still learning basics of kdb)
iii) If I just run
tab2:([]items:`a`b`a`a`b; x:-6 8 0 -3 5)
tabsum: select mu:avg x by items from tab2;
tab2: update y:mu from tab2 where x > 0 and x > mu
I get:
items x mu y
1 a -6 -3.0 NaN
2 b 8 6.5 6.5
3 a 0 -3.0 NaN
4 a -3 -3.0 NaN
5 b 5 6.5 6.5
Line 5 doesn't make sense to me. Why is y = 6.5 if x < mu? I expect y =5 for that row. Clearly my understanding of what's going on is wrong.
iv) How can I get the desired result (the code I've posted doesn't work properly)