1

I've been trying to loop over left joins (using R). I need to create a table with columns representing samples from a larger table. Each column of the new table should represent each of these samples.

library(tidyr)

largetable <- data.frame(PlotCode=c(rep("Plot1",20),rep("Plot2",20)),
                         Category=c(rep("A",8),rep("B",8),rep("C",4),rep("A",12),rep("B",4),rep("C",4)))
                         
a <- data.frame(PlotCode=c("Plot1","Plot1","Plot2","Plot2"),
                Category=c("A","B","A","B"))

##example of code to loop over 100 left joins derived from samples of two elements from a large table. It fails to create the columns.
for (i in 1:100){
  count <- largetable %>% group_by(PlotCode) %>% sample_n(2, replace = TRUE)%>%
    count(PlotCode,Category)
  colnames(count)[3] <- paste0("n",i)
  b <- left_join(a, count, by = c("PlotCode","Category"))
}

##example of desired output table. Columns n1 to n100 should change depending of samples.
b <- data.frame(PlotCode=c("Plot1","Plot1","Plot2","Plot2"),
                Category=c("A","B","A","B"),
                n1=c(2,1,0,1),
                n2=c(1,1,1,1),
                n3=c(2,0,1,2))

How can I loop over left joins so each column corresponds to a different sample?

D C
  • 13
  • 3
  • Can you share a small example of `largetable` (similar to `a`) and show how you want expected output to look? – Ronak Shah Oct 06 '20 at 03:35
  • Thanks Ronak. I have modified the script to show an example of "largetable" and an expected output table "b". Columns n1 to n100 in the output table should change according to each sampling from "largetable" – D C Oct 06 '20 at 05:55
  • Clarify via edits, not comments. What does that code have to do with your question? "Any idea" is not a valid question here. [ask] – philipxy Oct 06 '20 at 07:49
  • Please in code questions give a [mre]--cut & paste & runnable code, including smallest representative example input as code; desired & actual output (including verbatim error messages); tags & versions; clear specification & explanation. Give the least code you can that is code that you show is OK extended by code that you show is not OK. (Debugging fundamental.) For SQL that includes DBMS & DDL (including constraints & indexes) & input as code in table format. [ask] Pause work on the overall goal, chop code to the 1st expression not giving what you expect & say what you expect & why. – philipxy Oct 06 '20 at 07:50

1 Answers1

0

Instead of for loop we can use rerun/replicate to repeat a process n times.

In each iteration we randomly select 2 rows from each PlotCode and count their Category so you will have n lists which can be joined together using reduce and rename the column as per your choice and replace NA with 0.

library(dplyr)
library(purrr)

n <- 10

rerun(n, largetable %>% 
  group_by(PlotCode) %>% 
  slice_sample(n = 2, replace = TRUE) %>%
  count(PlotCode,Category)) %>%
  reduce(full_join, by = c('PlotCode', 'Category')) %>%
  rename_with(~paste0('n', seq_along(.)), starts_with('n')) %>%
  mutate(across(starts_with('n'), tidyr::replace_na, 0))

#  PlotCode Category    n1    n2    n3    n4    n5    n6    n7    n8    n9   n10
#  <chr>    <chr>    <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
#1 Plot1    A            1     0     2     2     0     1     0     1     2     2
#2 Plot1    B            1     0     0     0     1     1     2     1     0     0
#3 Plot2    B            1     0     0     0     1     0     0     0     0     0
#4 Plot2    C            1     2     0     0     0     0     1     1     0     0
#5 Plot1    C            0     2     0     0     1     0     0     0     0     0
#6 Plot2    A            0     0     2     2     1     2     1     1     2     2
Ronak Shah
  • 377,200
  • 20
  • 156
  • 213
  • Thanks a lot Ronak! this code solved my question. To use the slice_sample function, I had to install dplyr (1.0.0) on a recent R version (4.0.2). Then, the code worked without any problems. – D C Oct 06 '20 at 20:24