0

I have a data set with the following variables:

  • steps: Number of steps taking in a 5-minute interval
  • date: The date on which the measurement was taken in YYYY-MM-DD format
  • interval: Identifier for the 5-minute interval in which measurement was taken (288 intervals per day)

The main data set:

> head(activityData, 3)
      steps       date interval
1 1.7169811 2012-10-01        0
2 0.3396226 2012-10-01        5
3 0.1320755 2012-10-01       10
> str(activityData)
'data.frame':   17568 obs. of  3 variables:
 $ steps   : num  1.717 0.3396 0.1321 0.1509 0.0755 ...
 $ date    : chr  "2012-10-01" "2012-10-01" "2012-10-01" "2012-10-01" ...
 $ interval: num  0 5 10 15 20 25 30 35 40 45 ...

The data set has a range of two months.

I had to divided it by weekdays and weekend days. I did it with the following functions:

> dataAs.xtsWeekday <- dataAs.xts[.indexwday(dataAs.xts) %in% 1:5]

> dataAs.xtsWeekend <- dataAs.xts[.indexwday(dataAs.xts) %in% c(0, 6)]

After doing this I had to make some calculation, at which I failed so I decided to export the files and read them in, again.

After I imported the data again, I made the calculation I wanted, and I tried to merge the 2 datasets, but did not succeed.

First data set:

 > head(weekdays, 3)  
      X   steps       date interval daytype  
    1 1 37.3826 2012-10-01        0 weekday  
    2 2 37.3826 2012-10-01        5 weekday  
    3 3 37.3826 2012-10-01       10 weekday 

     > str(weekdays)  
    'data.frame':   12960 obs. of  5 variables:  
     $ X       : int  1 2 3 4 5 6 7 8 9 10 ...  
     $ steps   : num  37.4 37.4 37.4 37.4 37.4 ...  
     $ date    : chr  "2012-10-01" "2012-10-01" "2012-10-01" "2012-10-01" ...  
     $ interval: int  0 5 10 15 20 25 30 35 40 45 ...  
     $ daytype : chr  "weekday" "weekday" "weekday" "weekday" ...  

Second data set:

> head(weekend, 3)
  X steps       date interval daytype
1 1     0 2012-10-06        0 weekend
2 2     0 2012-10-06        5 weekend
3 3     0 2012-10-06       10 weekend
> str(weekend)
'data.frame':   4608 obs. of  5 variables:
 $ X       : int  1 2 3 4 5 6 7 8 9 10 ...
 $ steps   : num  0 0 0 0 0 0 0 0 0 0 ...
 $ date    : chr  "2012-10-06" "2012-10-06" "2012-10-06" "2012-10-06" ...
 $ interval: int  0 5 10 15 20 25 30 35 40 45 ...
 $ daytype : chr  "weekend" "weekend" "weekend" "weekend" ...

Now I would like to merge the 2 data sets (weekdays, weekends) by date, but the problem is that I don't have any common dates or anything else common.

The final data set should have 4 columns and 17568 observations.

The columns should be:

  • steps: Number of steps taking in a 5-minute interval
  • date: The date on which the measurement was taken in YYYY-MM-DD format
  • interval: Identifier for the 5-minute interval in which measurement was taken
  • daytype: weekends days or normal weekdays.

I tried with:

   merge  
   join(plyr)  
   union  

Everywhere I looked all the data sets had a common ID or a common column in both data sets, not like in my case.

I also looked here, but I did not understand much and at many others, but they had nothing in common with my data set.

The other option I thought about was to add a column to the original data set and call it "ID" and redo everything that I did so far; thing that I'll have to do if I don't find a way around this problem.

I would like some advice on how to proceed or what to try next.

Community
  • 1
  • 1
alecsx
  • 126
  • 7
  • Why you divided your original data.frame? you mentioned 'some calculation'. What is your primary question? What do you want do achieve at first? – Paulo E. Cardoso May 16 '14 at 23:35
  • I had to make some plots with the average value of steps per interval taken daily. For example: for every 5 min interval(i.e 0 - 5, equivalent for 00:00 to 00:05 AM) a number of steps was walked. I needed a mean for those steps, for all 00:00 to 00:05 intervals in two months, but separated means for weekdays and weekend days. – alecsx May 17 '14 at 00:02

1 Answers1

1

Since you mentioned that your final data set should have 17568 (=4608+12960) observations/rows, I assume you want to stack the two data.frames over each other (and possibly order them by date afterwards). This is done by using rbind().

finaldata <- rbind(weekdays, weekend)

If you want to remove column X:

finaldata$X <- NULL

To convert your date column to actual dates:

finaldata$date <- as.Date(finaldata$date, format="%Y-%m-%d")

To order the whole data by date:

finaldata <- finaldata[order(finaldata$date),]
talat
  • 68,970
  • 21
  • 126
  • 157