0

I'm using dplyr to try to get means of 6 variables according to 3 groups, and I want to have the count of each cell as well(i.e., I want to add a column of counts for each group-variable pair)

my code is something like this:

bitul_reason_tbl <- bitul_reason_calc %>% group_by(segment_name) %>% summarize(Total_Count=n(),
                                                       better_insurance = mean(better_insurance),count1=sum(bitul_reason_calc$better_insurance),
                                                       blank = mean(blank), count2=sum(bitul_reason_calc$blank),
                                                       kefel = mean(kefel), count3=sum(bitul_reason_calc$kefel),
                                                       no_need = mean(no_need), count4=sum(bitul_reason_calc$no_need),
                                                       other = mean(other), count5=sum(bitul_reason_calc$other),
                                                       price = mean(price), count6=sum(bitul_reason_calc$price),
                                                       sherut = mean(sherut),count7=sum(bitul_reason_calc$sherut))

The variables are all 0s or 1s, so summing is like counting. But what I get instead is the total sum of each variable repeated 3 times and not the sum as it is supposed to be per group. What's wrong?

# A tibble: 3 x 14
        segment_name Total_Count      price count1      kefel count2     sherut count3   nothing count4      other count5     blank count6
              <fctr>       <int>      <dbl>  <dbl>      <dbl>  <dbl>      <dbl>  <dbl>     <dbl>  <dbl>      <dbl>  <dbl>     <dbl>  <dbl>
1         briut_siud         277 0.11552347     69 0.02527076     22 0.04693141     27 0.1227437    101 0.05776173     81 0.6498195    465
2 vetek_up_half_year         225 0.09333333     69 0.02666667     22 0.03111111     27 0.1288889    101 0.14222222     81 0.5866667    465
3             teunot         247 0.06477733     69 0.03643725     22 0.02834008     27 0.1538462    101 0.13360324     81 0.6194332    465
Aurèle
  • 12,545
  • 1
  • 31
  • 49
Corel
  • 581
  • 3
  • 21
  • 1
    Why prefix with `bitul_reason_calc$`? Also consider `mutate_all`, `mutate_at`, `mutate_if` to avoid repetition – Aurèle Jun 28 '17 at 14:56
  • Using `bitul_reason_tbl$` refers to the orignal data.frame, not the grouped one. Just refer to the variables as is, like you do for the means. – Axeman Jun 28 '17 at 15:05
  • @Axeman - When I do that I get summation that results in values between 0 and 1 as if it was calculating some sort of means and not counting... – Corel Jun 29 '17 at 08:03
  • Well we don't have your data, so who knows? A simplified reproducible example goes a long way. – Axeman Jun 29 '17 at 08:06
  • `data.frame(g = c('a', 'a', 'a', 'b', 'b', 'b'), x = c(0, 1, 1, 0, 0, 1)) %>% group_by(g) %>% summarise(count = sum(x))` works just fine. Using `summarise(count = sum(.$x))` instead shows your problem, all counts become 3. – Axeman Jun 29 '17 at 08:09

2 Answers2

0
bitul_reason_tbl <- bitul_reason_calc %>% 
  group_by(segment_name) %>% 
  summarize(Total_Count=n(),
  better_insurance = mean(better_insurance),
  count1=sum(bitul_reason_calc$better_insurance),
  blank = mean(blank), count2=sum(bitul_reason_calc),
  kefel = mean(kefel), count3=sum(bitul_reason_calc),
  no_need = mean(no_need), count4=sum(bitul_reason_calc),
  other = mean(other), count5=sum(bitul_reason_calc),
  price = mean(price), count6=sum(bitul_reason_calc),
  sherut = mean(sherut),count7=sum(bitul_reason_calc))

You only need to reference the column name when using dplyr and chaining functions together.

troh
  • 1,354
  • 10
  • 19
0

Ok so the solution that worked for me (strangely) is that I switched the order by which I call for sum() and mean() inside summarize(). This is weird, but it worked.

Corel
  • 581
  • 3
  • 21