-1

I have data of a lot of villages with unique identifiers (village), I am trying to shape the data into a long format by first expanding by year and then by month.

df_input <- data.frame( village=c("A","B","C"), population = c(1000,1500,2000))    


df_output <- data.frame(year= c(2001,2001,2001,2002,2002,2002,2003,2003,2003),
                        month = c("1","2","3","1","2","3","1","2","3"),
                        village = c("A","A","A","B","B","B","C","C","C"), 
                        population = c (1000, 1000,1000,1500,1500,1500,2000,2000,2000))

I am using the following code to first expan it by the year:

df_panel <- df_input %>%
  merge(expand.grid(year=2001:2003, key=.$key), by="code_2011")

But I am getting the following error:

Error in fix.by(by.y, y) : 'by' must specify a uniquely valid column

Any idea what I am doing wrong?

Fuser
  • 47
  • 1
  • 9

1 Answers1

1

If I understand correctly, the OP wants to create a cross join or cartesian merge of

  • the years 2001:2003,
  • the months 1:3, and
  • each row of df_input.

There are several methods to achieve this, e.g.,

merge(list(year = 2001:2003), merge(list(month = 1:3), df_input))
   year month village population
1  2001     1       A       1000
2  2002     1       A       1000
3  2003     1       A       1000
4  2001     2       A       1000
5  2002     2       A       1000
6  2003     2       A       1000
7  2001     3       A       1000
8  2002     3       A       1000
9  2003     3       A       1000
10 2001     1       B       1500
11 2002     1       B       1500
12 2003     1       B       1500
13 2001     2       B       1500
14 2002     2       B       1500
15 2003     2       B       1500
16 2001     3       B       1500
17 2002     3       B       1500
18 2003     3       B       1500
19 2001     1       C       2000
20 2002     1       C       2000
21 2003     1       C       2000
22 2001     2       C       2000
23 2002     2       C       2000
24 2003     2       C       2000
25 2001     3       C       2000
26 2002     3       C       2000
27 2003     3       C       2000

or using expand.grid() and merge():

merge(expand.grid(year = 2001:2003, month = 1:3), df_input)

which returns the same result as above.

Uwe
  • 41,420
  • 11
  • 90
  • 134