0

I have a column that has estimated numbers for a conversion rate that looks like;

Type  Conversion
A      90
B      84
C      85-90
D      60-70

The problem is, I need to create a new column that takes the midpoint if the conversion was given in a range. So something like this;

Type   Conversion
A      90
B      84
C      87.5
D      65

How can I do this in R?

chattrat423
  • 603
  • 2
  • 11
  • 24

2 Answers2

6

I would do this as follows:

library(data.table)
DF <- data.frame(Type = LETTERS[1:4],
                 Conversion = c(90, 84, "85-90", "60-70"),
                 stringsAsFactors = FALSE)

setDT(DF)[ , Conversion := sapply(strsplit(Conversion, split = "-"),
                                  function(x) mean(as.numeric(x)))]
> DF
   Type Conversion
1:    A         90
2:    B         84
3:    C       87.5
4:    D         65

This relies on knowing the structure of your data--e.g., that there are no extraneous hyphens in Conversion anywhere (if the data is too big to check by hand, use, e.g., DF[ , table(nchar(gsub("[^-]", "", Conversion)))] to check)

Of course, it's just as easy in base R:

DF$Conversion <- sapply(strsplit(DF$Conversion, split = "-"),
                        function(x) mean(as.numeric(x)))
MichaelChirico
  • 33,841
  • 14
  • 113
  • 198
0

Good answer by Michael. Here is what I came up with:

library(magrittr)
x <- data.frame("Type"=LETTERS[1:4], "Conversion"=c('90', '84', '85-90', '60-70'))
x$Conversion <- strsplit(x$Conversion, "-") %>% lapply(., function(x){
    unlist(x) %>% as.numeric %>% mean
    }) %>% unlist
Ellis Valentiner
  • 2,136
  • 3
  • 25
  • 36