0

I am working on a project with embedded Python in C++ and have run into an issue with pandas DataFrames with datetimes/Timestamps.

When adding datetime objects to pandas, if they are within the range of Timestamp they seem to get auto-converted into a Timestamp object. For example:

pandas converts datetime to Timestamp automatically

This is problematic on the C++ side because I am using the PyDateTime API to extract the time information, and the pandas Timestamp object does not seem to be compatible.

For context, I am using boost python to extract the object, then I get the PyObject pointer out of the boost object and try to get the date data.

Here is what I am using to extract the date and time information:

if(PyDateTime_Check(pyObj)) {
    SQLSMALLINT year = PyDateTime_GET_YEAR(pyObj);
    SQLUSMALLINT month = PyDateTime_GET_MONTH(pyObj);
    SQLUSMALLINT day = PyDateTime_GET_DAY(pyObj);
    SQLUSMALLINT hour = PyDateTime_DATE_GET_HOUR(pyObj);
    SQLUSMALLINT minute = PyDateTime_DATE_GET_MINUTE(pyObj);
    SQLUSMALLINT second = PyDateTime_DATE_GET_SECOND(pyObj);
    SQLUINTEGER usec = PyDateTime_DATE_GET_MICROSECOND(pyObj);
}

None of the PyDateTime/PyDate/PyTime_Check functions will return true with pandas Timestamp objects, and all the extraction methods get seemingly random numbers if I try bypassing that check.

How do I actually get the relevant date/time information out of the Timestamp objects? Is there a way to get them directly (some API I missed), and if not is there a way to convert the pandas Timestamp object into a PyDateTime object? I can use the boost API or the boost::numpy API as well if needed, as I have them in my project already.

I hope not to have to modify the objects in the actual Python namespace, only on the C++ side after extraction, but if there's no other way it's not out of the question.

1 Answers1

0

I have figured out the way to do this, in case anyone looks for this later.

Pandas Timestamp objects actually do extract as PyDateTime objects, so they aren't the problem. My problem was that I was extracting them as numpy ndarrays to get the column from the Pandas DataFrame first, and numpy stores the timestamp as datetime64[ns] dtype. This is nanoseconds from Epoch (1970,1,1).

To make this numpy value work, I converted it to a double dtype using astype instead, then use PyDateTime_FromTimestamp.

PyDateTime_IMPORT;
double d = boost::python::extract<double>(boostObj);
PyObject *timeTuple = Py_BuildValue("(d)", d / 1000000000.0); // convert from nanosec to sec
timeObj = PyDateTime_FromTimestamp(timeTuple);
Py_DECREF(timeTuple);