I'm using the cut
function to split my data into groups using the max/min range. here is an example of the code that I am using:
# sample data frame - used to identify intial groups
testdf <- data.frame(a = c(1:100), b = rnorm(100))
# split into groups based on ranges
k <- 20 # number of groups
# split into groups, keep code
testdf$groupCode <- cut(testdf$b, breaks = k, labels = FALSE)
# store factor information
testdf$group <- cut(testdf$b, breaks = k)
head(testdf)
I want to use the factor groupings identified to split another data frame up, but I'm not sure how to use factors to deal with this. I think my code structure should be roughly as follows:
# this is the data I want to categorize based on previous groupings
datadf <- data.frame(a = c(1:100), b = rnorm(100))
datadf$groupCode <- function(x){return(groupCode)}
I see that the factor data is structure as follows, but I don't know how to use it properly:
testdf$group[0]
factor(0)
20 Levels: (-2.15,-1.91] (-1.91,-1.67] (-1.67,-1.44] (-1.44,-1.2] ... (2.34,2.58]
Two functions that I have been experimenting with (but which do not work) are as follows:
# get group code
nearestCode <- function( number, groups ){
return( which( abs( groups-number )== min( abs(groups-number) ) ) )
}
nearestCode(7, testdf$group[0])
And also experimenting with the which
function.
which(7, testdf$group[0])
What is the best way of identifying groupings and applying them to another dataframe?