5

I am loading a csv into a dataframe using

str <- readLines("Messages.csv", n=-1, skipNul=TRUE)
matches <- str_match(str, pattern = "\\s*([0-9]{2}/[0-9]{2}/[0-9]{4}),\\s*([0-9]{2}:[0-9]{2}:[0-9]{2}),\\s*(Me|Them),\\s*(\\+[0-9]{11,12}),\\s*((?s).*)")
df <- data.frame(matches[, -1], stringsAsFactors=F)
colnames(df) <- c("date","time","sender","phone number","msg")


# Format the date and create a row with the number of characters of the messages
df <- df %>%
mutate(posix.date=parse_date_time(paste0(date,time),"%d%m%y%H%M%S"),tz="Europe/London") %>%           
 mutate(nb.char = nchar(msg)) %>%
 select(posix.date, sender, msg, nb.char) %>%
 arrange(as.numeric(posix.date))

I can change sender names using

# Change the senders' names
df <- df %>%
  mutate(sender = replace(sender, sender == "Me", "Mr. Awesome")) 

But I want to change the time zone for the data from to tz="America/Los_Angeles"

I have tried the follow both without success:

attributes(df)$tz<-"America/Los_Angeles"

this compiles but nothing seems to change

and also this:

df <- df %>%
mutate(date = replace(date, format(date, tz="America/Los_Angeles",usetz=TRUE)))

which gives the error: "Error in eval(expr, envir, enclos) : argument "values" is missing, with no default"

Perhaps I am not specifying the original time zone correctly, but I have no idea really how to check that it went through.

Thanks!

Jaap
  • 81,064
  • 34
  • 182
  • 193
Frank
  • 53
  • 1
  • 3
  • Is there a reason why you aren't using `read.csv()`? – Rich Scriven Sep 22 '15 at 04:43
  • I should add, I am new to r and the base of this code here is from this blog posts, so most function decisions were not my own. http://iwoaf.com/data-of-long-distance-lovers/ – Frank Sep 22 '15 at 04:50
  • I think the reason is one of the entries into the dataframes are messages and these can contain commas so using read.csv would break up messages and parse incorrectly. Using readlines was also not ideal because some messages had \n new line characters, but there were fewer of these. – Frank Sep 22 '15 at 04:52

1 Answers1

4

First, you can change the time zone of a POSIXct variable. It is not meaningful to "change the time zone in a data.frame", so setting a "tz" attribute of a data.frame does nothing.

[ Note: it is meaningful, however, to change the time zone of an xts object. See this post. ]

I gather that your timestamps are in GMT and you want to convert that to the equivalent in PST. If this is what you are intending, then this should work:

df$posix.date <- as.POSIXct(as.integer(df$posix.date),
                            origin="1970-01-01", 
                            tz="American/Los_Angeles")

For example:

x <- as.POSIXct("2015-01-01 12:00:00", tz="Europe/London")
x
# [1] "2015-01-01 12:00:00 GMT"
as.POSIXct(as.integer(x),origin="1970-01-01",tz="America/Los_Angeles")
# [1] "2015-01-01 04:00:00 PST"

The issue here is that as.POSIXct(...) works differently depending on the class of the object passed to it. If you pass a character or integer, the time zone is set according to tz=.... If you pass an object that is already POSIXct, the tz=... argument is ignored. So here we convert x to integer so the tz=... argument is respected.

Really convoluted. If there's an easier way I'd love to hear about it.

Community
  • 1
  • 1
jlhoward
  • 58,004
  • 7
  • 97
  • 140
  • Thank you! df$posix.date <- as.POSIXct(as.integer(df$posix.date), origin="1970-01-01", tz="America/Los_Angeles") – Frank Sep 22 '15 at 06:23
  • I am not sure what's happening but these seems generally to work, but it's not very accurate it seems. I converted my data from London time to LA time and then back to London time and my histograms looked different from just using the original data. Any ideas? – Frank Sep 28 '15 at 06:31
  • Actually, using Lubridate I was able to do what I wanted by using the force_tz and with_tz functions – Frank Sep 28 '15 at 07:22
  • Yea, it was that parse_date_time was ignoring my time zone specification and put as UCT rather than british time, so I had to force to british time as a first step (using force_tz) and then switch to LA time using with_tz – Frank Sep 28 '15 at 07:24
  • When you have a data table and setkey on a date, it actually changes time zone. It's super frustrating and buggy! – wolfsatthedoor Aug 05 '19 at 22:37