Calculate product of frequencies in each column

Question

I have a data frame with 3 columns, each containing a small number of values:

> df
# A tibble: 364 x 3
   A     B     C
 <dbl> <dbl> <dbl>
  0.    1. 0.100
  0.    1. 0.200
  0.    1. 0.300
  0.    1. 0.500
  0.    2. 0.100
  0.    2. 0.200
  0.    2. 0.300
  0.    2. 0.600
  0.    3. 0.100
  0.    3. 0.200
# ... with 354 more rows



> apply(df, 2, table)
$`A`

 0  1  2  3  4  5  6  7  8  9 10 
34 37 31 32 27 39 29 28 37 39 31 

$B

 1  2  3  4  5  6  7  8  9 10 11 
38 28 38 37 32 34 29 33 30 35 30 

$C

0.1 0.2 0.3 0.4 0.5 0.6 
 62  65  65  56  60  56

I would like to create a fourth column, which will contain for each row the product of the frequencies of each value withing each group. So for example the first value of the column "Freq" would be the product of the frequency of zero within column A, the frequency of 1 within column B and the frequency of 0.1 within column C.

How can I do this efficiently with dplyr/baseR?

To emphasize, this is not the combined frequency of each total row, but the product of the 1-column frequencies

what about the last row? Column A has 11 elements, column B has also 11 while column C has 6.. So how do tou do that? — Onyambu, Apr 30 '18 at 05:53

Jaap · Answer 1 · 2018-04-30T07:07:05.747

An efficient approach using a combination of lapply, Map & Reduce from base R:

l <- lapply(df, table)

m <- Map(function(x,y) unname(y[match(x, names(y))]), df, l)

df$D <- Reduce(`*`, m)

which gives:

> head(df, 15)
    A  B   C     D
1   3  5 0.4 57344
2   5  6 0.5 79560
3   0  4 0.1 77996
4   2  6 0.1 65348
5   5 11 0.6 65520
6   3  8 0.5 63360
7   6  6 0.2 64090
8   1  9 0.4 62160
9  10  2 0.2 56420
10  5  2 0.2 70980
11  4 11 0.3 52650
12  7  6 0.5 57120
13 10  1 0.2 76570
14  7 10 0.5 58800
15  8 10 0.3 84175

What this does:

lapply(df, table) creates a list of frequency for each column
With Map a list is created with match where each list-item has the same length as the number of rows of df. Each list-item is a vector of frequencies corresponding to the values in df.
With Reduce the product of the vectors in the list m is calculated element wise: the first value of each vector in the list m are mulplied with each other, then the 2nd value, etc.

The same approach in tidyverse:

library(dplyr)
library(purrr)

df %>% 
  mutate(D = map(df, table) %>% 
           map2(df, ., function(x,y) unname(y[match(x, names(y))])) %>% 
           reduce(`*`))

Used data:

set.seed(2018)
df <- data.frame(A = sample(rep(0:10, c(34,37,31,32,27,39,29,28,37,39,31)), 364),
                 B = sample(rep(1:11, c(38,28,38,37,32,34,29,33,30,35,30)), 364),
                 C = sample(rep(seq(0.1,0.6,0.1), c(62,65,65,56,60,56)), 364))

Onyambu · Answer 2 · 2018-04-30T06:41:45.867

will use the following small example

df
 A  B   C
1   3  5 0.4
2   5  6 0.5
3   0  4 0.1
4   2  6 0.1
5   5 11 0.6
6   3  8 0.5
7   6  6 0.2
8   1  9 0.4
9  10  2 0.2
10  5  2 0.2
sapply(g,table)
$A

 0  1  2  3  5  6 10 
 1  1  1  2  3  1  1 

$B

 2  4  5  6  8  9 11 
 2  1  1  3  1  1  1 

$C

0.1 0.2 0.4 0.5 0.6 
  2   3   2   2   1 


library(tidyverse)
df%>%
  group_by(A)%>%
  mutate(An=n())%>%
  group_by(B)%>%
  mutate(Bn=n())%>%
  group_by(C)%>%
  mutate(Cn=n(),prod=An*Bn*Cn)



       A     B     C    An    Bn    Cn  prod
   <int> <int> <dbl> <int> <int> <int> <int>
 1     3     5 0.400     2     1     2     4
 2     5     6 0.500     3     3     2    18
 3     0     4 0.100     1     1     2     2
 4     2     6 0.100     1     3     2     6
 5     5    11 0.600     3     1     1     3
 6     3     8 0.500     2     1     2     4
 7     6     6 0.200     1     3     3     9
 8     1     9 0.400     1     1     2     2
 9    10     2 0.200     1     2     3     6
10     5     2 0.200     3     2     3    18

Calculate product of frequencies in each column

2 Answers2