7

In R a vector can not contain different types. Everything must e.g. be an integer or everything must be character etc. This gives me headaches sometimes. E.g. when I want to add a margin to a data.frame, and need some coloumns to be numeric and other to be characters.

Below a reproducible example:

# dummy data.frame
set.seed(42)
test <- data.frame("name"=sample(letters[1:4], 10, replace=TRUE),
                   "val1" = runif(10,2,5),
                   "val2"=rnorm(10,10,5),
                   "Status"=sample(c("In progres", "Done"), 10, replace=TRUE),
                   stringsAsFactors = FALSE)

# check that e.g. "val1" is indeed numeric
is.numeric(test$val1)
# TRUE
# create coloumn sums for my margin.
tmpSums <- colSums(test[,c(2:3)])
# Are the sums numeric?
is.numeric(tmpSums[1])
#TRUE
# So add the margin
test2 <- rbind(test, c("All", tmpSums, "Mixed"))
# is it numeric
is.numeric(test2$val1)
#FALSE
# DAMN. Because the vector `c("All", tmpSums, "Mixed")` contains strings
# the whole vector is forced to be a string. And when doing the rbind
# the orginal data.frame is forced to a new type also

# my current workaround is to convert back to numeric
# but this seems convoluted, back and forward.
valColoumns <- grepl("val", names(test2))
test2[,valColoumns] <- apply(test2[,valColoumns],2, function(x) as.numeric(x))
is.numeric(test2$val1)
# finally. It works.

there must be an easier / better way?

MichaelChirico
  • 33,841
  • 14
  • 113
  • 198
Andreas
  • 6,612
  • 14
  • 59
  • 69

2 Answers2

6

Use a list object in your rbind, like:

test2 <- rbind(test, c("All", unname(as.list(tmpSums)), "Mixed"))

Where the second argument to rbind is a list, removed of conflicting names that will cause rbind to fail:

c("All", unname(as.list(tmpSums)), "Mixed")
#[[1]]
#[1] "All"
# 
#[[2]]
#[1] 37.70092
#
#[[3]]
#[1] 91.82716
#
#[[4]]
#[1] "Mixed"
thelatemail
  • 91,185
  • 12
  • 128
  • 188
  • 1
    Not really worth a separate answer, but `rbind` for `data.table`s has a `use.names` argument which would allow you to skip `unname`. Matter of taste. – MichaelChirico Feb 24 '16 at 03:36
  • thanks. I did find the unname bit myself - but never thought to use a list. Thanks. – Andreas Feb 24 '16 at 13:00
1

Here is an option using data.table. We convert the 'data.frame' to 'data.table' (setDT(test)), get the sum of the numeric columns using lapply, concatenate (c) with the values that should represent for other columns, place it in a list and use rbindlist

library(data.table)
rAll <-  setDT(test)[, c(name="All", lapply(.SD, sum), 
              Status="Mixed"), .SDcols= val1:val2]
rbindlist(list(test, rAll))

If we need to make it a bit more automatic,

i1 <- sapply(test, is.numeric)
v1 <- setNames(list("All", "Mixed"), setdiff(names(test),
                      names(test)[i1]))
rAll <-  setDT(test)[, c(v1, lapply(.SD, sum)), 
                 .SDcols=i1][, names(test), with=FALSE]
rbindlist(list(test, rAll))
akrun
  • 874,273
  • 37
  • 540
  • 662
  • 1
    Sweet. This is a great answer. I have to mark 'thelatemail' answer as accepted because it is a bit closer (using data.frame) - but this is arguable an answer i'll use just as much - and maybe learned more from. – Andreas Feb 24 '16 at 13:01
  • @Andreas Thanks for the feedback. Yes, you should mark thelatemail's answer as it is a great one with the original idea. – akrun Feb 24 '16 at 13:05