I'm trying to return a datetime object from my python UDF for use in a Pig script (note I'm simplfying the problem here, my actual UDF does some thing a lot more complex than returning the current time but the object returned is the same):
Pig version 0.12.1, Hortonworks distribution.
My UDF is as follows:
@outputSchema("timeNowSchema")
def time_now(dt):
return datetime.datetime.now()
@outputSchema("timeNowSchema")
def timeNowSchema(dt):
dt = [DataType.DATETIME]
return SchemaUtil.newTupleSchema(dt)
However, when using the UDF I get the following:
org.apache.pig.backend.executionengine.ExecException: ERROR 0: Non supported pig datatype found, cast failed: org.python.core.PyObjectDerived
Looking in the responsible org.apache.pig.scripting.jython.JythonUtils pigToPython function, I see there is no apparent means to actually carry out the conversion, despite DataType.DATETIME as an allowable returntype.
Is there any way to return a datetime/timestamp object that pig will process as a datetime?
Update: I've tried returning a time.struct_time object instead. This still doesn't work, though at least the function completes: However, a tuple is returned by pig instead, not the datetime object I really want:
[python]
time.struct_time(tm_year=2000, tm_mon=11, tm_mday=30, tm_hour=0, tm_min=0, tm_sec=0, tm_wday=3, tm_yday=335, tm_isdst=-1)
[pig]
((2000,11,30,0,0,0,0,0,-1))
Update 2 I'm now outputting a ISO formatted datetime string from the UDF, as per fred's suggestion. After poking around in the pig source, it doesn't look like this is yet possible.