4

Following on from the question here:

I'm trying to create the series by hand here using Rpy2

import rpy2.robjects as ro
from rpy2.robjects.packages import importr
import pandas.rpy.common as com

pa = importr("pa")

ro.r("data(jan)")
jan = com.load_data('jan')

jan_r  = com.convert_to_r_dataframe(jan)

name = ro.StrVector([str(i) for i in jan['name']])
sector = ro.StrVector([str(i) for i in jan['sector']])
date = ro.StrVector([str(i) for i in jan['date']])

and I get at date number of 14610 in the date field representing 2010-01-01 which I suspect is a 1970-01-01 origin. I can't find anything in the datetime module that will allow me to change the origin for the date however so I don't know how to reset it.

My questions:

  1. Is the origin for the R sourced date 1970-01-01?
  2. Is there a way to set an origin and covert to a datetime.datetime object in python?
  3. Am I missing something more obvious here?

Thanks

Tahnoon Pasha
  • 5,848
  • 14
  • 49
  • 75

2 Answers2

5

Is the origin for the R sourced date 1970-01-01?

From ?Date:

Dates are represented as the number of days since 1970-01-01, with negative values for earlier dates.


I get at date number of 14610 in the date field representing 2010-01-01 which I suspect is a 1970-01-01 origin.

Well suspected.

as.Date(14610, origin = "1970-01-01")
## [1] "2010-01-01"

Is there a way to set an origin and covert to a datetime.datetime object in python?

Python datetime docs show several ways of constructing a date.

You can use datetime.date(year, month, day) syntax, where those values can be retrieved from the R dates using year(x), month(x) and mday(x), where x represents your date vector.

You can use date.fromtimestamp(timestamp) syntax, where the timestamps can be retrieved from the R dates using format(x).

The date.fromordinal(ordinal) documentation returns:

the date corresponding to the Gregorian ordinal, where January 1 of year 1 has ordinal 1

So presumably your problem is that you are passing dates as numbers which R calculates as days from 1st Jan 1970, and python assumes are from 1st Jan 0001.

Community
  • 1
  • 1
Richie Cotton
  • 118,240
  • 47
  • 247
  • 360
  • thanks @RichieCotton. In this case I don't get to choose the input format of the data. I think I'll just add the timedelta to the numbers in this instance. Appreciate your help. – Tahnoon Pasha Nov 12 '14 at 07:42
2

Ok, but how to express this number correctly in python?

import datetime
pd.to_datetime(18402,unit='D', origin='1970-1-1')`

18402 corresponds to 2020-05-20. The parameter origin is the default one, so you can skip it.

Camilo Abboud
  • 879
  • 7
  • 7