5

I am using the following code to give me the day of the week from a date (in the form dd/mm/yyyy).

Edit: I have uploaded a more relvant dataset.

df <- structure(list(Date = c("18/01/2013", "18/01/2013", "18/01/2013", 
                    "18/01/2013", "18/01/2013"), Time = c("07:25:30", "07:25:40", 
                                                          "07:25:50", "07:26:00", "07:26:10"), Axis1 = c(217L, 320L, 821L, 
                                                                                                         18L, 40L), Steps = c(6L, 7L, 5L, 1L, 1L), wday = c(7, 7, 7, 7, 7)), .Names = c("Date", "Time", "Axis1", "Steps", "wday"), row.names = 18154:18158, class = "data.frame")


library(lubridate)
df$wday = wday(df$Date)
df$wday.name = wday(df$Date, label = TRUE, abbr = TRUE)

The 18/1 was however a Friday and not a Saturday as R reports.

Does anyone have any suggestions of how to rectify this?

EDIT: I tried to follow the suggestions given by Dirk...

as.POSIXlt(df[,1])$wday

... but this still implies that the 18/1 is a Saturday.

My timezone is GMT/UTC (+ 1 for British Summer Time), however because I just want R to read from the date column (which is just d/m/y), I presume I don't need to specify this...

How can I get a correct wday column to be added to my existing R dataframe? (as detailed previously in my original script). I am struggling to get the suggested coding working as I gave the dataframe in the wrong format - apologies.

Matt Parker
  • 26,709
  • 7
  • 54
  • 72
KT_1
  • 8,194
  • 15
  • 56
  • 68
  • your timezone is BST (+0100), not GMT/UTC (+0000). An inconsistent use of timezones can sometimes cause an off-by-one error in dates or weekdays – Walter Tross Jun 18 '13 at 11:44

3 Answers3

6

You can use base R functions for this. Using your df object:

 R> as.POSIXlt(df[,1])$wday  
 [1] 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 
 R> weekdays(as.Date(df[,1])) 
  [1] "Friday"   "Friday"   "Friday"   "Friday"   "Friday"
  [6] "Friday"   "Friday"   "Friday"   "Friday"   "Friday" 
 [11] "Friday"   "Friday"   "Friday"   "Friday"   "Saturday"  
 [16] "Saturday" "Saturday" "Saturday" "Saturday" 
 R>     

There is a spillover into Saturday for the end because the TZ was not specified.

If you do

 R> df <- data.frame(Date=seq(as.POSIXct("05:00", format="%H:%M", tz="UTC"),
 +                  as.POSIXct("23:00", format="%H:%M", tz="UTC"), by="hours"))

then

 R> table(weekdays(as.Date(df[,1], TZ="UTC")))

 Friday
    19
 R> 

I presume the Fri/Sat error may go away under lubridate too, but I tend to use base R functions for this.

Edit: Confirmed.

R> lubridate::wday(as.Date(df[,1]), label=TRUE) 
 [1] Fri Fri Fri Fri Fri Fri Fri Fri Fri Fri Fri Fri Fri Fri 
[15] Fri Fri Fri Fri Fri          
Levels: Sun < Mon < Tues < Wed < Thurs < Fri < Sat  
R>
Dirk Eddelbuettel
  • 360,940
  • 56
  • 644
  • 725
  • Many thanks @Dirk Eddelbuettel . Apologies but my original dataframe was in the wrong format and I am still having problems. Any help would be gratefully received... – KT_1 Jun 03 '13 at 09:13
  • I have added a bounty because with my new dataframe, I am still struggling to work out a solution. However I wanted to thank you @Dirk Eddelbuettel for your help so far with my problem. – KT_1 Jun 11 '13 at 14:59
3

I think the issue here is a simple one. The 'lubridate' package is made for exactly this type of work, but the issue in the question seems to be just about understanding the 'lubridate' functions.

The reason the OP is seeing strange results is that the date in 'df' are not stored in an unambiguous format (decreasing order of units). This means that when the 'wday' function is called, it is applying an incorrect conversion and misreading the dates.

To counteract this problem, the OP has already added the idea of converting the strings into dates, which is exactly right. However, the 'as.POSIXlt' function is a cumbersome tool, and the 'lubridate' package already has an answer: the 'dmy' function. This is how it works:

df$wday <- wday(dmy(df$Date))
df$wday.name <- wday(dmy(df$Date), label=TRUE, abbr=TRUE)

We are doing something quite simple here. We are first converting 'df$Date' from a set of strings into a set of dates. The 'dmy' function automatically parses the strings looking for day, then month, then year (hence d-m-y). Once we have the strings in the right format, we can use the 'wday' function properly.

Dinre
  • 4,196
  • 17
  • 26
0

I think Dinre's answer is the easiest - I find working with Dates less error-prone than POSIX - but here's a straightforward way to get the correct result while using both your Date and Time columns.

# Convert your Date variable into a proper Date class
# This is the base-R equivalent of Dinre's dmy()
df$Date2 <- as.Date(df$Date, format = "%d/%m/%Y")

# Paste it together with your Time into a POSIX variable with timezone
# I think "GB" is the correct timezone code for you, but not certain
df$datetime <- as.POSIXct(paste(df$Date2, df$Time), tz = "GB")

# Calculate weekday
wday(df$datetime, label = TRUE)

The nice thing about this is that you can use df$datetime for pretty much anything else (e.g., plots) and get consistent results. If you're really only going to use the date, then Dinre's answer is all you need.

Matt Parker
  • 26,709
  • 7
  • 54
  • 72
  • This is the list of timezone abbreviations I used: http://en.wikipedia.org/wiki/List_of_zoneinfo_time_zones – Matt Parker Jun 11 '13 at 18:40
  • 1
    I would recommend you take a look at the 'lubridate' package, if you're not already familiar with it, Matt. I used to use the base R functions all the time, until I discovered 'lubridate'. It has exactly the same functionality but with wonderfully concise wrappers that save me a lot of time. I'm a full convert now, and I heartily recommend it. The base functions do the same thing, of course, so it's not like your output will be any different, just your code. – Dinre Jun 12 '13 at 02:01
  • @Dinre Thanks for the suggestion - I use lubridate all the time for higher-level date operations, but I still like using the base functions for converting types - in part because I want to hang on to my knowledge of the `%d/%m/%Y`-style format codes in case I run into something really tangled. – Matt Parker Jun 12 '13 at 23:02