0

I have a data frame, and I would like to repeat each row in the data frame, changing the region variable each time for every variable in my list.

For example, I have this data frame.

          id  region
1   MG562518   <NA>
2   MG562519   <NA>
3   MG562520   <NA>
4   MG562521   <NA>
5   MG562522   <NA>

I also have this list of states

[1] "AK" "AL" "CT" "MA"

For every id, I need a new row for every state.

       id  region
MG562518   AK
MG562518   AL
MG562518   CT
MG562518   MA
MG562519   AK
MG562519   AL
MG562519   CT
MG562519   MA
MG562520   AK
MG562520   AL
MG562520   CT
MG562520   MA
MG562521   AK
MG562521   CT
MG562521   MA
etc...

I have tried to use joins and I have tried a nested complete function, but it did not work for me.

   arrange(id)

I think the answer is a cartesian join, but I couldn't find how to do that in R. Any advice would be appreciated. Thank you

aholtz
  • 175
  • 6
  • `expand.grid(id = your_data$id, region = your_vector_of_states)` – Gregor Thomas Nov 24 '20 at 16:39
  • You're right - you could also do this with a Cartesian join - more commonly called a cross join. Several methods are mentioned [at the FAQ about different types of joins](https://stackoverflow.com/q/1299871/903061), or here's a [dplyr-specific question about cross joins](https://stackoverflow.com/a/62277055/903061) - as of right now, see my answer there for the most up-to-date method. – Gregor Thomas Nov 24 '20 at 16:42

1 Answers1

2

Does this work:

library(dplyr)
library(tidyr)
uncount(df, weights = length(states)) %>% mutate(region = rep(states, nrow(df)))
# A tibble: 20 x 2
   id       region
   <chr>    <chr> 
 1 MG562518 AK    
 2 MG562518 AL    
 3 MG562518 CT    
 4 MG562518 MA    
 5 MG562519 AK    
 6 MG562519 AL    
 7 MG562519 CT    
 8 MG562519 MA    
 9 MG562520 AK    
10 MG562520 AL    
11 MG562520 CT    
12 MG562520 MA    
13 MG562521 AK    
14 MG562521 AL    
15 MG562521 CT    
16 MG562521 MA    
17 MG562522 AK    
18 MG562522 AL    
19 MG562522 CT    
20 MG562522 MA    

Data used:

df
# A tibble: 5 x 2
  id       region
  <chr>    <chr> 
1 MG562518 NA    
2 MG562519 NA    
3 MG562520 NA    
4 MG562521 NA    
5 MG562522 NA    
Karthik S
  • 11,348
  • 2
  • 11
  • 25