7

I'm having trouble converting a .csv column of data with weekdays to a number (so that 1 = Monday, 2 = Tuesday, 3 = Wednesday, etc). I'm trying to use the strptime feature as shown here: http://www.inside-r.org/r-doc/base/strftime

Since I want to convert the weekday to a number, I used the "%u" formatting option. Here's my code below:

> newweekdaynum <- strptime(SFCrimeData$DayOfWeek, "%u")

where SFCrimeData is a data set I have that has a bunch of crime information. No errors come up after I run the statement, but when I want to print "newweekdaynum" all that comes is a huge table of values that all say "NA".

What am I doing wrong?

Roland
  • 127,288
  • 10
  • 191
  • 288
Raleigh L.
  • 599
  • 2
  • 13
  • 18
  • 5
    If you have `v1 <- c('Monday', 'Tuesday', ....'Sunday'); factor(v1, levels=c('Monday',,.., 'Sunday'), labels=1:7)` or use `?match` – akrun Oct 08 '15 at 06:55
  • Please `dput(SFCrimeData$DayOfWeek)` and add the output to your post. –  Oct 08 '15 at 06:55
  • @Pascal, the SFCrimeData file has about 800k rows, so I can't quite paste the full output here, but I ran the command and basically all it is is a number from 1 to 10 with an "L" right after it. – Raleigh L. Oct 08 '15 at 07:06
  • 1
    @RaleighL. Have you tried the `factor` method I suggested? – akrun Oct 08 '15 at 07:11
  • @erasmortg So I ran that and again got the large set of data with the numbers and the L after it, and then at the bottom I had this `.Label = c("Friday", "Monday", "Saturday", "Sunday", ` `"Thursday", "Tuesday", "Wednesday"), class = "factor")` `[1] Wednesday Wednesday Wednesday Wednesday Wednesday` `Wednesday` `Levels: Friday Monday Saturday Sunday Thursday Tuesday Wednesday`. How do I strip the L from each of the values so that it's just the integer itself? – Raleigh L. Oct 08 '15 at 07:13
  • You **must** show what your input is. Otherwise nobody can help you. Edit the output of `dput(head(SFCrimeData$DayOfWeek))` into your question. – Roland Oct 08 '15 at 07:17
  • @akrun I typed it in and got the following: `[1] 1 2 3 4 5 6 7` `Levels: 1 2 3 4 5 6 7` but when I print v1 all I got is: `[1] "Monday" "Tuesday" "Wednesday" "Thursday" "Friday" "Saturday" "Sunday" `, and not the full data column that was supposed to be changed.? – Raleigh L. Oct 08 '15 at 07:20
  • I reiterate: Provide your input, so we can stop this guessing game. – Roland Oct 08 '15 at 07:22
  • @Roland Like I said, the data set is too large to copy the full output of that into here. It won't even show fully in my console window in R Studio, that' show large it is. But I can describe it, all that shows is a bunch of numbers between 1 to 7 and then an L after each of the those numbers. And then at the bottom, after all the numbers, is: `.Label = c("Friday", "Monday", "Saturday", "Sunday",` `"Thursday", "Tuesday", "Wednesday"), class = "factor")` [`1] Wednesday Wednesday Wednesday Wednesday Wednesday` `Wednesday` `Levels: Friday Monday Saturday Sunday Thursday Tuesday Wednesday` – Raleigh L. Oct 08 '15 at 07:23
  • **Use the command I provided!** The output can't be too large. – Roland Oct 08 '15 at 07:24
  • @Roland Ahh I apologize, I was running head(dput( and not dput(head( as you asked. – Raleigh L. Oct 08 '15 at 07:24
  • @Roland `> dput(head(SFCrimeData$DayOfWeek))` `structure(c(7L, 7L, 7L, 7L, 7L, 7L), .Label = c("Friday", "Monday",` `"Saturday", "Sunday", "Thursday", "Tuesday", "Wednesday"), class =` `"factor")` – Raleigh L. Oct 08 '15 at 07:25
  • Is your "weekdays" in a date/time format or just a raw string? Parsing could be a solution that you are looking for `aRandomDate = "01/30/1995" aRandomDate.parsed = strptime(aRandomDate, "%m/%d/%Y") print(aRandomDate.parsed) # [1] "1995-01-30 CET" format(aRandomDate.parsed, "%u") # [1] "1"` –  Oct 08 '15 at 07:29
  • Probably, you have not assigned it to a new object? – akrun Oct 08 '15 at 08:58

2 Answers2

11

strptime can be used if you have something that can be resolved into a full date/datetime. It will return a datetime object. That's not what you want.

Instead you can make use of ordered factors:

#some example data
set.seed(42)
x <- factor(sample(c("Monday", "Tuesday", "Wednesday", 
                     "Thursday", "Friday", "Saturday", "Sunday"),
            20, TRUE))
# [1] Sunday    Sunday    Wednesday Saturday  Friday    Thursday  Saturday  Monday    Friday    Friday    Thursday  Saturday  Sunday   
#[14] Tuesday   Thursday  Sunday    Sunday    Monday    Thursday  Thursday 
#Levels: Friday Monday Saturday Sunday Thursday Tuesday Wednesday

#turn into ordered factor
x <- factor(x, levels = c("Monday", "Tuesday", "Wednesday", 
                          "Thursday", "Friday", "Saturday", "Sunday"),
            ordered = TRUE)
#[1] Sunday    Sunday    Wednesday Saturday  Friday    Thursday  Saturday  Monday    Friday    Friday    Thursday  Saturday  Sunday   
#[14] Tuesday   Thursday  Sunday    Sunday    Monday    Thursday  Thursday 
#Levels: Monday < Tuesday < Wednesday < Thursday < Friday < Saturday < Sunday

#extract underlying integer values
as.integer(x)
#[1] 7 7 3 6 5 4 6 1 5 5 4 6 7 2 4 7 7 1 4 4

(You wouldn't really need to make it an ordered factor, a factor with the levels specified in the correct order would be sufficient, but weekdays are conceptionally an ordered factor.)

Roland
  • 127,288
  • 10
  • 191
  • 288
5
df$Date <- as.Date(df$Date)  
df$wkdaynum <- format(df$Date,"%u")  
df$wkdaynum <- as.numeric(df$wkdaynum)

So, your mistake was to use strptime() instead of format().