I have this dataset in R:
library(stringr)
set.seed(999)
col1 = sample.int(5, 100, replace = TRUE)
col2 = sample.int(5, 100, replace = TRUE)
col3 = sample.int(5, 100, replace = TRUE)
col4 = sample.int(5, 100, replace = TRUE)
col5 = sample.int(5, 100, replace = TRUE)
col6 = sample.int(5, 100, replace = TRUE)
col7 = sample.int(5, 100, replace = TRUE)
col8 = sample.int(5, 100, replace = TRUE)
col9 = sample.int(5, 100, replace = TRUE)
col10 = sample.int(5, 100, replace = TRUE)
d = data.frame(id = 1:10, seq = c(paste(col1, collapse = ""), paste(col2, collapse = ""), paste(col3, collapse = ""), paste(col4, collapse = ""), paste(col5, collapse = ""), paste(col6, collapse = ""), paste(col7, collapse = ""), paste(col8, collapse = ""), paste(col9, collapse = ""), paste(col10, collapse = "")))
For each row, I would like to create new variables:
- d$most_common: the most common element in each row
- d$second_most_common: the second most common element in each row
- d$third_most_common: the third most common element in each row
I tried to do this with the following function (Find the most frequent value by row):
rowMode <- function(x, ties = NULL, include.na = FALSE) {
# input checks data
if ( !(is.matrix(x) | is.data.frame(x)) ) {
stop("Your data is not a matrix or a data.frame.")
}
# input checks ties method
if ( !is.null(ties) && !(ties %in% c("random", "first", "last")) ) {
stop("Your ties method is not one of 'random', 'first' or 'last'.")
}
# set ties method to 'random' if not specified
if ( is.null(ties) ) ties <- "random"
# create row frequency table
rft <- table(c(row(x)), unlist(x), useNA = c("no","ifany")[1L + include.na])
# get the mode for each row
colnames(rft)[max.col(rft, ties.method = ties)]
}
rowMode(d[1,1])
This gave me an error:
Error in rowMode(d[1, 1]) : Your data is not a matrix or a data.frame.
Which is a bit confusing, seeing as "d" is a data.frame.
- Is there an easier way to do this?
Thank you!