0

I am trying to split a column Awards in a dataframe but the column when split returns different number of results , how do I bind it back to the original dataframe:

SAMPLE DF:

        Name   Value     Awards
1       A1      NA      3 wins.
2       A2      1000    NA
3       A3      NA      2 wins.
4       A4      1999    1 win
5       A5      8178569 5 wins & 4 nominations.

EXPECTED RESULT:

        Name   Value     Awards                 AwardsNum  Cat
1       A1      NA      3 wins.                 3          A
2       A2      1000    NA                      NA         NA
3       A3      NA      2 wins.                 2          A
4       A4      1999    1 win                   1          A
5       A5      8178569 5 wins & 4 nominations. 9          C

So basically I need to split the Awards and every number before for wins and nomination I need to add a function that sum them up and then provide a Category (Cat) based on the result of the function and a range of values

I have the following :

  strsplit(DF$Awards," ")
  cbind(DF,strsplit(DF$Awards," ") 

Error in data.frame(c("3", "wins."), "N/A", c("2", "wins."), c("1", "win." : 
arguments imply differing number of rows: 2, 1, 5

UPDATE: CATEGORIES <--- for NA and no awards and nominations - A <--- between 1 to 5 Category B <-- else C

I need to play around between B and C since I need to make sure that they are not more than 5:1 ratio between B and C
E B
  • 1,073
  • 3
  • 23
  • 36
  • What determines the different categories? For example, how would we know that one row should be category "A" versus category "C"? – jdobres Oct 09 '16 at 20:11

2 Answers2

0

The solution is to use a regular expression to match all numbers. Then you can sum them and assign categories.

library(stringr)

df_new <- sapply(DF$Awards, function(x){
    # get all numbers
    nums <- unlist(str_match_all(x, "[0-9]+"))
    # calculate sum
    AwardsNum <- sum(as.numeric(nums))
    # assign category basing on sum
    if (is.na(AwardsNum)){
        Cat <- NA
    }else if(AwardsNum == 0){
        Cat <- "A"
    }else if(AwardsNum < 5){
        Cat <- "B"
    }else{
        Cat <- "C"
    }
    return(c(AwardsNum, Cat))
})

# create new rows in df
DF$AwardsNum <- as.numeric(df_new[1, ])
DF$Cat <- df_new[2, ]
Istrel
  • 2,508
  • 16
  • 22
0

I just realized that @Istrel already posted an answer while I was working on this question. I'll post mine anyways since it's slightly different.

df <- data.frame(
    Name = c("A1", "A2", "A3", "A4", "A5"),
    Value = c(NA, 1000, NA, 1999, 8178569),
    Awards = c("3 wins", NA, "2 wins", "1 win", "5 wins & 4 nomiations")
)

library(magrittr)
n.awards <- sapply(df$Awards, function(x){
    ifelse(is.na(x), 0,{
        x %>% as.character %>%
            strsplit("[^0-9]+") %>%
            unlist %>%
            as.numeric %>%
            sum
    })
})
brks <- c(-0.1,0.9,4.9, 100)
cc <- cut(n.awards,brks)
cat <- c("A", "B", "C")
df.final <- cbind(df, AwardsNum = n.awards, Cat = cat[cc])

Using cut, you can group vectors without using multiple if statements.

parksw3
  • 649
  • 4
  • 11
  • !parksw3 and @lstrel, both your suggestions are great.. and faster than doing aloop – E B Oct 13 '16 at 01:07
  • @lstrel, the only thing i am trying to discover is how to bring it back together to the original dataframe .. i was thinking i can do rbind but not sure how i can be sure that i join it to the original row – E B Oct 13 '16 at 01:14