I have a data set with the following variables:
- steps: Number of steps taking in a 5-minute interval
- date: The date on which the measurement was taken in YYYY-MM-DD format
- interval: Identifier for the 5-minute interval in which measurement was taken (288 intervals per day)
The main data set:
> head(activityData, 3)
steps date interval
1 1.7169811 2012-10-01 0
2 0.3396226 2012-10-01 5
3 0.1320755 2012-10-01 10
> str(activityData)
'data.frame': 17568 obs. of 3 variables:
$ steps : num 1.717 0.3396 0.1321 0.1509 0.0755 ...
$ date : chr "2012-10-01" "2012-10-01" "2012-10-01" "2012-10-01" ...
$ interval: num 0 5 10 15 20 25 30 35 40 45 ...
The data set has a range of two months.
I had to divided it by weekdays and weekend days. I did it with the following functions:
> dataAs.xtsWeekday <- dataAs.xts[.indexwday(dataAs.xts) %in% 1:5]
> dataAs.xtsWeekend <- dataAs.xts[.indexwday(dataAs.xts) %in% c(0, 6)]
After doing this I had to make some calculation, at which I failed so I decided to export the files and read them in, again.
After I imported the data again, I made the calculation I wanted, and I tried to merge the 2 datasets, but did not succeed.
First data set:
> head(weekdays, 3)
X steps date interval daytype
1 1 37.3826 2012-10-01 0 weekday
2 2 37.3826 2012-10-01 5 weekday
3 3 37.3826 2012-10-01 10 weekday
> str(weekdays)
'data.frame': 12960 obs. of 5 variables:
$ X : int 1 2 3 4 5 6 7 8 9 10 ...
$ steps : num 37.4 37.4 37.4 37.4 37.4 ...
$ date : chr "2012-10-01" "2012-10-01" "2012-10-01" "2012-10-01" ...
$ interval: int 0 5 10 15 20 25 30 35 40 45 ...
$ daytype : chr "weekday" "weekday" "weekday" "weekday" ...
Second data set:
> head(weekend, 3)
X steps date interval daytype
1 1 0 2012-10-06 0 weekend
2 2 0 2012-10-06 5 weekend
3 3 0 2012-10-06 10 weekend
> str(weekend)
'data.frame': 4608 obs. of 5 variables:
$ X : int 1 2 3 4 5 6 7 8 9 10 ...
$ steps : num 0 0 0 0 0 0 0 0 0 0 ...
$ date : chr "2012-10-06" "2012-10-06" "2012-10-06" "2012-10-06" ...
$ interval: int 0 5 10 15 20 25 30 35 40 45 ...
$ daytype : chr "weekend" "weekend" "weekend" "weekend" ...
Now I would like to merge the 2 data sets (weekdays, weekends) by date, but the problem is that I don't have any common dates or anything else common.
The final data set should have 4 columns and 17568 observations.
The columns should be:
- steps: Number of steps taking in a 5-minute interval
- date: The date on which the measurement was taken in YYYY-MM-DD format
- interval: Identifier for the 5-minute interval in which measurement was taken
- daytype: weekends days or normal weekdays.
I tried with:
merge
join(plyr)
union
Everywhere I looked all the data sets had a common ID or a common column in both data sets, not like in my case.
I also looked here, but I did not understand much and at many others, but they had nothing in common with my data set.
The other option I thought about was to add a column to the original data set and call it "ID" and redo everything that I did so far; thing that I'll have to do if I don't find a way around this problem.
I would like some advice on how to proceed or what to try next.