0

I have a dataframe of presence records from 1763 samples for 59 taxa. Here is an example dataframe with less taxa and samples:

            sample site family n_days
1    A_1_17/06/12   A1      X      3
2    A_1_17/06/12   A1      Y      3
3    A_1_17/06/12   A1      Z      3
4  A_3_02/11/2011   A3      X      5
5  A_3_02/11/2011   A3      V      5
6  A_3_02/11/2011   A3      W      5
7  A_1_22/02/2011   A1      X      3
8  A_1_22/02/2011   A1      V      3
9  A_1_22/02/2011   A1      Z      3
10 A_3_19/11/2011   A3      U      3
11 A_3_19/11/2011   A3      Y      3
12 A_3_19/11/2011   A3      Z      3

What I want is to create a column of occupancy that has a 1 if a taxon was present in a sample and a 0 if a taxon was absent in a sample. Here is an example output:

           sample site n_days family occupancy
1    A_1_17/06/12   A1      3      X         1
2    A_1_17/06/12   A1      3      Y         1
3    A_1_17/06/12   A1      3      Z         1
4    A_1_17/06/12   A1      3      V         0
5    A_1_17/06/12   A1      3      W         0
6    A_1_17/06/12   A1      3      U         0
7  A_3_02/11/2011   A3      5      X         1
8  A_3_02/11/2011   A3      5      V         1
9  A_3_02/11/2011   A3      5      W         1
10 A_3_02/11/2011   A3      5      Y         0
11 A_3_02/11/2011   A3      5      Z         0
12 A_3_02/11/2011   A3      5      U         0
13 A_1_22/02/2011   A1      3      X         1
14 A_1_22/02/2011   A1      3      V         1
15 A_1_22/02/2011   A1      3      Z         1
16 A_1_22/02/2011   A1      3      Y         0
17 A_1_22/02/2011   A1      3      W         0
18 A_1_22/02/2011   A1      3      U         0
19 A_3_19/11/2011   A3      3      U         1
20 A_3_19/11/2011   A3      3      Y         1
21 A_3_19/11/2011   A3      3      Z         1
22 A_3_19/11/2011   A3      3      X         0
23 A_3_19/11/2011   A3      3      V         0
24 A_3_19/11/2011   A3      3      W         0
    

Any suggestions would be appreciated.

JoshuaAJones
  • 316
  • 1
  • 2
  • 7

1 Answers1

1

Create an occupancy column with value as 1 and use complete to create the combinations and fill to fill the missing values.

library(dplyr)
library(tidyr)

df %>%
  mutate(occupancy = 1) %>%
  complete(sample, family, fill = list(occupancy = 0)) %>%
  group_by(sample) %>%
  fill(site, n_days, .direction = 'updown') %>%
  ungroup 

#   sample         family site  n_days occupancy
#   <chr>          <chr>  <chr>  <int>     <dbl>
# 1 A_1_17/06/12   U      A1         3         0
# 2 A_1_17/06/12   V      A1         3         0
# 3 A_1_17/06/12   W      A1         3         0
# 4 A_1_17/06/12   X      A1         3         1
# 5 A_1_17/06/12   Y      A1         3         1
# 6 A_1_17/06/12   Z      A1         3         1
# 7 A_1_22/02/2011 U      A1         3         0
# 8 A_1_22/02/2011 V      A1         3         1
# 9 A_1_22/02/2011 W      A1         3         0
#10 A_1_22/02/2011 X      A1         3         1
# … with 14 more rows
Ronak Shah
  • 377,200
  • 20
  • 156
  • 213