Let's say that there is such data frame:
a b c
1. 2 2 3
2. 5 4 4
3. 1 7 4
4. 1 9 4
5. 2 14 0
6. 9 10 6
I would like to discretize data in column b and input means of received ranges as discrete values for instances in specified column of processed data frame. Predicted result could look as follows:
a b c
1. 2 3 3
2. 5 3 4
3. 1 8 4
4. 1 8 4
5. 2 12 0
6. 9 12 6
I came across of functions like discretize from arules library
res <- discretize(df$b, method = "frequency", breaks = 3)
which I suppose could solve the problem but I found it impossible to input means back to df.
Edit
Thanks to solutions given in comments I was able to achieve satisfying distribution of original data between ranges. I tested it also on df$b <- iris$Petal.Length
(@alistaire solution):
ave(df$b, cut(df$b, quantile(df$b, seq(0, 1, length = 8)),
include.lowest = TRUE), FUN = mean)
With following results:
hist(df$b)$count
24 20 0 0 22 0 21 21 23 0 19
If someone knows other way of discretizing instances of column in data frame it would be appreciated. (especially discretization which could divide data on ranges with equal instances count)