1

I have a r conundrum and would be very grateful of any assistance please. I need to write a piece of code that requires to be written one line to fit with a larger automated process. I have supplied some dummy data to help illustrate.

I have three ifelse statements that return 1’s or 0’s. I need to sum these 1’s and 0’s yet because of other inherited constraints in my real data I can’t refer to their output ‘and then’ sum them. I ‘need’ to sum them on the fly.

To be explicit… I cannot explicitly refer to the output 1’s and 0’s of either ‘use_sms’, ‘use_data’ or ‘use_voice’ and I cannot just pass an apply/1/sum to the dataframe.

Somehow, what I need is a fully contained sum of the three ifelse’s, something along the lines of… in crude non r language…

sum(
ifelse(sms_rev0 & sms_cnt0 > 0 | sms_rev1 & sms_cnt1 > 0 | sms_rev2 & sms_cnt2 > 0, 1, 0),
ifelse(data_rev0 & data_cnt0 > 0 | data_rev1 & data_cnt1 > 0 | data_rev2 & data_cnt2 > 0, 1, 0),
ifelse(voice_rev0 & voice_cnt0 > 0 | voice_rev1 & voice_cnt1 > 0 | voice_rev2 & voice_cnt2 > 0, 1, 0)
) 

My real data is presented to me similar to this headache_df

headache_df = data.frame(sms_rev0 = sample(1:0, 10, replace = T),
                        sms_cnt0 = sample(1:0, 10, replace = T),
                        sms_rev1 = sample(1:0, 10, replace = T),
                        sms_cnt1 = sample(1:0, 10, replace = T),
                        sms_rev2 = sample(1:0, 10, replace = T),
                        sms_cnt2 = sample(1:0, 10, replace = T),
                        data_rev0 = sample(1:0, 10, replace = T),
                        data_cnt0 = sample(1:0, 10, replace = T),
                        data_rev1 = sample(1:0, 10, replace = T),
                        data_cnt1 = sample(1:0, 10, replace = T),
                        data_rev2 = sample(1:0, 10, replace = T),
                        data_cnt2 = sample(1:0, 10, replace = T),
                        voice_rev0 = sample(1:0, 10, replace = T),
                        voice_cnt0 = sample(1:0, 10, replace = T),
                        voice_rev1 = sample(1:0, 10, replace = T),
                        voice_cnt1 = sample(1:0, 10, replace = T),
                        voice_rev2 = sample(1:0, 10, replace = T),
                        voice_cnt2 = sample(1:0, 10, replace = T))

row.names(headache_df) = paste0("row", 1:10)

And i am looking to capture my results in this headache combating panado_df

panado_df = data.frame(user = row.names(headache_df))
attach(headache_df)
set.seed(1234)

I generate three ifelse statements to illustrate but in my real data its really the sum of these I need to capture.

panado_df$use_sms = ifelse(sms_rev0 & sms_cnt0 > 0 | sms_rev1 & sms_cnt1 > 0 | sms_rev2 & sms_cnt2 > 0, 1, 0)
panado_df$use_data = ifelse(data_rev0 & data_cnt0 > 0 | data_rev1 & data_cnt1 > 0 | data_rev2 & data_cnt2 > 0, 1, 0)
panado_df$use_voice = ifelse(voice_rev0 & voice_cnt0 > 0 | voice_rev1 & voice_cnt1 > 0 | voice_rev2 & voice_cnt2 > 0, 1, 0)
rownames(panado_df) = panado_df$user
panado_df$user = NULL

I present a target column to illustrate what my calculated data should look like. Any cool solutions to achieve my aim please?

panado_df$target_column = apply(panado_df, 1, sum)
CallumH
  • 751
  • 1
  • 7
  • 22

2 Answers2

1

If I understand you correctly, you might be looking for something like this

panado_df$sums_3 <- sum(ifelse(sms_rev0 & sms_cnt0 > 0 | sms_rev1 & sms_cnt1 > 0 | sms_rev2 & sms_cnt2 > 0, 1, 0),
    ifelse(data_rev0 & data_cnt0 > 0 | data_rev1 & data_cnt1 > 0 | data_rev2 & data_cnt2 > 0, 1, 0),
    ifelse(voice_rev0 & voice_cnt0 > 0 | voice_rev1 & voice_cnt1 > 0 | voice_rev2 & voice_cnt2 > 0, 1, 0))

And your code could be more descriptive (just like you did it) using dplyr like follows

pando_df <- headach_df %>%
    mutate(use_sms=ifelse(sms_rev0 & sms_cnt0 > 0 | sms_rev1 & sms_cnt1 > 0 | sms_rev2 & sms_cnt2 > 0, 1, 0),
        use_data = ifelse(data_rev0 & data_cnt0 > 0 | data_rev1 & data_cnt1 > 0 | data_rev2 & data_cnt2 > 0, 1, 0),
        use_voice = ifelse(voice_rev0 & voice_cnt0 > 0 | voice_rev1 & voice_cnt1 > 0 | voice_rev2 & voice_cnt2 > 0, 1, 0)) %>%
    rowwise() %>%
    mutate(target_column=sum(use_sms, use_data, use_voice))

and if you'd like to return the vector target_column directly, adding magrittr library, check the following

pando_df <- headach_df %>%
    mutate(use_sms=ifelse(sms_rev0 & sms_cnt0 > 0 | sms_rev1 & sms_cnt1 > 0 | sms_rev2 & sms_cnt2 > 0, 1, 0),
        use_data = ifelse(data_rev0 & data_cnt0 > 0 | data_rev1 & data_cnt1 > 0 | data_rev2 & data_cnt2 > 0, 1, 0),
        use_voice = ifelse(voice_rev0 & voice_cnt0 > 0 | voice_rev1 & voice_cnt1 > 0 | voice_rev2 & voice_cnt2 > 0, 1, 0)) %>%
    rowwise() %>%
    mutate(target_column=sum(use_sms, use_data, use_voice)) %$%
    target_column
mabdrabo
  • 1,050
  • 21
  • 35
  • Hi @mabdrabo. Unfortunately, that returns the overall sum for the entire target_column. I require a row by row sum. – CallumH Dec 19 '16 at 07:53
  • Hi @mabdrabo. I have an issue in that in my real data panado_df is part of a bigger process and I cant coerce it to headache_df. I am trying to use your suggested code to directly add a column to panado_df. No success yet but I'm going to play around with it. :) – CallumH Dec 19 '16 at 08:20
  • @CallumH - if you could share more, we could be of more help. good luck :) – mabdrabo Dec 19 '16 at 08:23
  • All sorted now thnx @mabdrabo. Sharing is usually caring... but on this occasion it would just have complicated matters - was trying to keep it simple. – CallumH Dec 19 '16 at 09:10
  • @CallumH - glad I could help :) – mabdrabo Dec 19 '16 at 09:17
0
headache_df <-within(headache_df, {
       use_sms <- as.integer(sms_rev0 & sms_cnt0  | sms_rev1 & sms_cnt1 | sms_rev2 & sms_cnt2)
       use_data<- as.integer(data_rev0 & data_cnt0  | data_rev1 & data_cnt1  | data_rev2 & data_cnt2)
       use_voice<- as.integer(voice_rev0 & voice_cnt0  | voice_rev1 & voice_cnt1  | voice_rev2 & voice_cnt2)
       target <- use_sms + use_data + use_voice 
})
Jean
  • 1,480
  • 15
  • 27