Recently, I have found that I am using the following pattern over and over again. The process is:
- cross-tabulate numeric variable by factor using
table
- create data frame from created table
- add original numeric values to data frame (from row names (!))
- remove row names
- reorder columns of aggregated data frame
In R, it looks like this:
# Sample data
df <- data.frame(x = round(runif(100), 1),
y = factor(ifelse(runif(100) > .5, 1, 0),
labels = c('failure', 'success'))
)
# Get frequencies
dfSummary <- as.data.frame.matrix(table(df$x, df$y))
# Add column of original values from rownames
dfSummary$x <- as.numeric(rownames(dfSummary))
# Remove rownames
rownames(dfSummary) <- NULL
# Reorder columns
dfSummary <- dfSummary[, c(3, 1, 2)]
Is there anything more elegant in R, preferably using base functions? I know I can use sql to do this in single command - I think that it has to be possible to achieve similar behavior in R.
sqldf solution:
library(sqldf)
dfSummary <- sqldf("select
x,
sum(y = 'failure') as failure,
sum(y = 'success') as success
from df group by x")