3

Good morning,

I'm trying to convert from character to POSIXct, but when I do, I lose the hours and minutes from the data.

hourlyData (dataframe)

    Login   Expo    EquityUSD   Period                    UnrealizedProfitUSD
1   252957  0.00    7.187185    2014-02-03 00:00:00.000     0.00000
2   252957  0.00    7.187772    2014-02-03 01:00:00.000     0.00000
3   252957  0.00    7.188198    2014-02-03 02:00:00.000     0.00000
4   252957  0.00    7.187825    2014-02-03 03:00:00.000     0.00000
5   252957  0.00    7.187079    2014-02-03 04:00:00.000     0.00000
6   252957  0.00    7.187079    2014-02-03 05:00:00.000     0.00000
7   252957  0.00    7.188731    2014-02-03 06:00:00.000     0.00000
8   252957  0.00    7.186279    2014-02-03 07:00:00.000     0.00000
9   252957  0.00    7.187185    2014-02-03 08:00:00.000     0.00000

when I type class(hourlyData$Period) I get Character. When I try and convert this column as such hourlyData$Period = as.POSIXct(hourlyData$Period), however, I get the following output:
hourlyData

    Login   Expo    EquityUSD   Period       UnrealizedProfitUSD
1   252957  0.00    7.187185    2014-02-03   0.00000
2   252957  0.00    7.187772    2014-02-03   0.00000
3   252957  0.00    7.188198    2014-02-03   0.00000
4   252957  0.00    7.187825    2014-02-03   0.00000
5   252957  0.00    7.187079    2014-02-03   0.00000
6   252957  0.00    7.187079    2014-02-03   0.00000
7   252957  0.00    7.188731    2014-02-03   0.00000
8   252957  0.00    7.186279    2014-02-03   0.00000
9   252957  0.00    7.187185    2014-02-03   0.00000

Where the hours and minutes have been removed from the Period column. Does anyone know why this happens, or how to prevent it?

Thanks

Mike

Mike
  • 1,049
  • 5
  • 20
  • 46
  • This is probably OS specific in some way, the following command yields the correct result for me (R 3.1.1 on Mac OS X): `as.POSIXct(c('2014-02-03 00:00:00.000', '2014-02-03 01:00:00.000', '2014-02-03 02:00:00.000'))`. – Paul Hiemstra Nov 26 '14 at 08:01
  • I also get the correct result, R 3.1.1, windows 7. – Cath Nov 26 '14 at 08:17
  • I've actually found that there is exactly one date making trouble. 2014-03-31 03:00:00. For some unknown reason, when I remove that particular date, the conversion works. So I think you guys are correct, it should work, but perhaps there is something wrong with my data. Thanks for the help! – Mike Nov 26 '14 at 08:29
  • Does anyone else get the following error: `a = strptime("2014-03-31 03:00:00", '%Y-%m-%d %H:%M:%S')`. `as.POSIXct(a)`. With the result being NA? – Mike Nov 26 '14 at 08:36
  • 3
    Are you by chance in Turkey? http://www.timeanddate.com/time/dst/2014a.html Please state your time zone. – Roland Nov 26 '14 at 08:39
  • I'm in Cyprus. Why would that affect that one specific data point? – Mike Nov 26 '14 at 08:40
  • Follow the link in my comment. – Roland Nov 26 '14 at 08:41
  • Oh right, sorry, bit slow on the uptake. Ah! Well, time to rerun all that code then I guess haha! Thanks Roland! – Mike Nov 26 '14 at 08:42
  • 2
    General advice: Always specify time zones explicitly when creating POSIXt objects (by using the `tz` parameter). – Roland Nov 26 '14 at 08:42

3 Answers3

4

The other answers hint at the problem but don't really address it. as.POSIXct(...) has bizarre behaviour when a character vector is passed with an invalid time: rather than returning NA for those elements with invalid time, as.POSIXct(...) removes the time part for all elements.

You can "fix" this by explicitly providing the format specification, even though you are using the default spec (see last line below).

x <- sprintf('%02d:00:00', 20:25)          # 25:00:00 is not a valid time spec.
y <- sprintf('%s %s', '2018-01-01',x)      # last element has invalid time
as.POSIXct(head(y,-1))                     # works fine
## [1] "2018-01-01 20:00:00 HST" "2018-01-01 21:00:00 HST" "2018-01-01 22:00:00 HST" "2018-01-01 23:00:00 HST" "2018-01-02 00:00:00 HST"
as.POSIXct(y)                              # fails miserably
## [1] "2018-01-01 HST" "2018-01-01 HST" "2018-01-01 HST" "2018-01-01 HST" "2018-01-01 HST" "2018-01-01 HST"
as.POSIXct(y, tz='UTC')                    # tz does not fix this...
## [1] "2018-01-01 UTC" "2018-01-01 UTC" "2018-01-01 UTC" "2018-01-01 UTC" "2018-01-01 UTC" "2018-01-01 UTC"
as.POSIXct(y, format='%Y-%m-%d %H:%M:%S')  # but this does...
## [1] "2018-01-01 20:00:00 HST" "2018-01-01 21:00:00 HST" "2018-01-01 22:00:00 HST" "2018-01-01 23:00:00 HST" "2018-01-02 00:00:00 HST" NA 

Running R 3.4.0 on Win 7 x64.

jlhoward
  • 58,004
  • 7
  • 97
  • 140
1

I'd use strptime:

char_data = c('2014-02-03 00:00:00.000', '2014-02-03 01:00:00.000', '2014-02-03 02:00:00.000')
strptime(char_data, '%Y-%m-%d %H:%M:%S')
[1] "2014-02-03 00:00:00 CET" "2014-02-03 01:00:00 CET"
[3] "2014-02-03 02:00:00 CET"
Paul Hiemstra
  • 59,984
  • 12
  • 142
  • 149
0

"2014-03-31 03:00:00" is not a valid datetime, where you live, as "2014-03-31 02:59:59" was immediately followed by "2014-03-31 04:00:00" due to switching to daylight saving time.

Ra.
  • 2,499
  • 3
  • 28
  • 41