0

I have a data frame with 3 columns, each containing a small number of values:

> df
# A tibble: 364 x 3
   A     B     C
 <dbl> <dbl> <dbl>
  0.    1. 0.100
  0.    1. 0.200
  0.    1. 0.300
  0.    1. 0.500
  0.    2. 0.100
  0.    2. 0.200
  0.    2. 0.300
  0.    2. 0.600
  0.    3. 0.100
  0.    3. 0.200
# ... with 354 more rows



> apply(df, 2, table)
$`A`

 0  1  2  3  4  5  6  7  8  9 10 
34 37 31 32 27 39 29 28 37 39 31 

$B

 1  2  3  4  5  6  7  8  9 10 11 
38 28 38 37 32 34 29 33 30 35 30 

$C

0.1 0.2 0.3 0.4 0.5 0.6 
 62  65  65  56  60  56 

I would like to create a fourth column, which will contain for each row the product of the frequencies of each value withing each group. So for example the first value of the column "Freq" would be the product of the frequency of zero within column A, the frequency of 1 within column B and the frequency of 0.1 within column C.

How can I do this efficiently with dplyr/baseR?

To emphasize, this is not the combined frequency of each total row, but the product of the 1-column frequencies

Jaap
  • 81,064
  • 34
  • 182
  • 193
Omry Atia
  • 2,411
  • 2
  • 14
  • 27
  • 1
    what about the last row? Column A has 11 elements, column B has also 11 while column C has 6.. So how do tou do that? – Onyambu Apr 30 '18 at 05:53

2 Answers2

2

An efficient approach using a combination of lapply, Map & Reduce from base R:

l <- lapply(df, table)

m <- Map(function(x,y) unname(y[match(x, names(y))]), df, l)

df$D <- Reduce(`*`, m)

which gives:

> head(df, 15)
    A  B   C     D
1   3  5 0.4 57344
2   5  6 0.5 79560
3   0  4 0.1 77996
4   2  6 0.1 65348
5   5 11 0.6 65520
6   3  8 0.5 63360
7   6  6 0.2 64090
8   1  9 0.4 62160
9  10  2 0.2 56420
10  5  2 0.2 70980
11  4 11 0.3 52650
12  7  6 0.5 57120
13 10  1 0.2 76570
14  7 10 0.5 58800
15  8 10 0.3 84175

What this does:

  • lapply(df, table) creates a list of frequency for each column
  • With Map a list is created with match where each list-item has the same length as the number of rows of df. Each list-item is a vector of frequencies corresponding to the values in df.
  • With Reduce the product of the vectors in the list m is calculated element wise: the first value of each vector in the list m are mulplied with each other, then the 2nd value, etc.

The same approach in tidyverse:

library(dplyr)
library(purrr)

df %>% 
  mutate(D = map(df, table) %>% 
           map2(df, ., function(x,y) unname(y[match(x, names(y))])) %>% 
           reduce(`*`))

Used data:

set.seed(2018)
df <- data.frame(A = sample(rep(0:10, c(34,37,31,32,27,39,29,28,37,39,31)), 364),
                 B = sample(rep(1:11, c(38,28,38,37,32,34,29,33,30,35,30)), 364),
                 C = sample(rep(seq(0.1,0.6,0.1), c(62,65,65,56,60,56)), 364))
Jaap
  • 81,064
  • 34
  • 182
  • 193
0

will use the following small example

df
 A  B   C
1   3  5 0.4
2   5  6 0.5
3   0  4 0.1
4   2  6 0.1
5   5 11 0.6
6   3  8 0.5
7   6  6 0.2
8   1  9 0.4
9  10  2 0.2
10  5  2 0.2
sapply(g,table)
$A

 0  1  2  3  5  6 10 
 1  1  1  2  3  1  1 

$B

 2  4  5  6  8  9 11 
 2  1  1  3  1  1  1 

$C

0.1 0.2 0.4 0.5 0.6 
  2   3   2   2   1 


library(tidyverse)
df%>%
  group_by(A)%>%
  mutate(An=n())%>%
  group_by(B)%>%
  mutate(Bn=n())%>%
  group_by(C)%>%
  mutate(Cn=n(),prod=An*Bn*Cn)



       A     B     C    An    Bn    Cn  prod
   <int> <int> <dbl> <int> <int> <int> <int>
 1     3     5 0.400     2     1     2     4
 2     5     6 0.500     3     3     2    18
 3     0     4 0.100     1     1     2     2
 4     2     6 0.100     1     3     2     6
 5     5    11 0.600     3     1     1     3
 6     3     8 0.500     2     1     2     4
 7     6     6 0.200     1     3     3     9
 8     1     9 0.400     1     1     2     2
 9    10     2 0.200     1     2     3     6
10     5     2 0.200     3     2     3    18
Onyambu
  • 67,392
  • 3
  • 24
  • 53