0

I have two data frames. Data frame "weather" looks like this:

weather<-data.frame(Date=c("2012-04-01","2012-04-02","2012-04-03","2012-04-04"),Day=c("Sunday","Monday","Tuesday","Wednesday"), Temp=c(86,89,81,80))
Date       Day       Temperature    
2012-04-01 Sunday     86
2012-04-02 Monday     89
2012-04-03 Tuesday    81
2012-04-04 Wednesday  80

And, data frame "Regularity", looks like this:

Regularity<-data.frame(Date=c("2012-04-02","2012-04-04","2012-04-03","2012-04-04"),EmployeeID=c(1,1,2,2),Attendance=c(1,1,1,1))

Date        EmployeeID Attendance
2012-04-02           1          1
2012-04-04           1          1
2012-04-03           2          1
2012-04-04           2          1

I want to create a panel dataframe in R of the form:

Date       Day       Temperature EmployeeID Attendence  
2012-04-01 Sunday     86              1         0
2012-04-02 Monday     89              1         1
2012-04-03 Tuesday    81              1         0
2012-04-04 Wednesday  80              1         1
2012-04-01 Sunday     86              2         0
2012-04-02 Monday     89              2         0
2012-04-03 Tuesday    81              2         1
2012-04-04 Wednesday  80              2         1

I have tried the merge and reshape2, but in vain. I will be very grateful for any help. Thank you.

Animesh
  • 43
  • 1
  • 6
  • This question seems to be mainly about code and this would be more suitable for SO. –  Jan 28 '14 at 09:12
  • There is also `join` in plyr package - `df <- join(weather, Regularity, by=c("Date"))` - although merge should do the job as well. What is the error message? – martin Jan 28 '14 at 10:11

1 Answers1

1

Here is how. Suppose tb1 is the first table and tb2 is the second. Then the desired result will be achieved by following:

tb2_tf<-dcast(tb2,Date~EmployeeID,value.var="Attendance")
tb<-melt(merge(tb1,tb2_tf,all=TRUE),id=1:3,variable.name="EmployeeID",value.name="Attendance")
tb$Attendance[is.na(tb$Attendance)] <- 0
tb
       Date       Day Temperature EmployeeID Attendance
1 2012-04-01    Sunday          86          1          0
2 2012-04-02    Monday          89          1          1
3 2012-04-03   Tuesday          81          1          0
4 2012-04-04 Wednesday          80          1          1
5 2012-04-01    Sunday          86          2          0
6 2012-04-02    Monday          89          2          0
7 2012-04-03   Tuesday          81          2          1
8 2012-04-04 Wednesday          80          2          1

I would like to see the solution without the reshape part. I suspect there is one using some form of theta join.

mpiktas
  • 11,258
  • 7
  • 44
  • 57
  • Thanks Mpiktas. While this works like a charm, I don't understand the id=1:3 part.What id does this refer to? When I try this in the original large dataset, I get the Error: "id variables not found in data: NA", and If i don't pick an id, a default id is chosen that gives a weird result. Many thanks for your help. – Animesh Jan 28 '14 at 11:12
  • `id=1:3` is simply first three columns. – mpiktas Jan 28 '14 at 11:14
  • Many thanks again. In the main large data set I get the dcast warning for "Aggregation function missing: defaulting to length" and all the rows are repeated 3 times for each each, how can this be prevented? – Animesh Jan 29 '14 at 05:31
  • you should add to the `id` all the columns of `tb1` which are not in the table tb2_tf. – mpiktas Jan 29 '14 at 08:00