1

I'm trying to return a datetime object from my python UDF for use in a Pig script (note I'm simplfying the problem here, my actual UDF does some thing a lot more complex than returning the current time but the object returned is the same):

Pig version 0.12.1, Hortonworks distribution.

My UDF is as follows:

@outputSchema("timeNowSchema")
def time_now(dt):
        return datetime.datetime.now()

@outputSchema("timeNowSchema")
def timeNowSchema(dt):
        dt = [DataType.DATETIME]
        return SchemaUtil.newTupleSchema(dt)

However, when using the UDF I get the following:

org.apache.pig.backend.executionengine.ExecException: ERROR 0: Non supported pig datatype found, cast failed: org.python.core.PyObjectDerived

Looking in the responsible org.apache.pig.scripting.jython.JythonUtils pigToPython function, I see there is no apparent means to actually carry out the conversion, despite DataType.DATETIME as an allowable returntype.

Is there any way to return a datetime/timestamp object that pig will process as a datetime?

Update: I've tried returning a time.struct_time object instead. This still doesn't work, though at least the function completes: However, a tuple is returned by pig instead, not the datetime object I really want:

[python]
time.struct_time(tm_year=2000, tm_mon=11, tm_mday=30, tm_hour=0, tm_min=0, tm_sec=0, tm_wday=3, tm_yday=335, tm_isdst=-1)
[pig]
((2000,11,30,0,0,0,0,0,-1))

Update 2 I'm now outputting a ISO formatted datetime string from the UDF, as per fred's suggestion. After poking around in the pig source, it doesn't look like this is yet possible.

undershock
  • 754
  • 1
  • 6
  • 26
  • 2
    I don't know. You could return a string with an ISO-formatted date instead and parse it in Pig instead. – Frederic Jul 17 '15 at 09:30
  • @Fred - thanks for the suggestion. I've tried it and it works as a workaround, but isn't all that clean. I'd rather output natively, but I'll wait to see I get any other suggestions. – undershock Jul 17 '15 at 10:15
  • Have you tried explicitly stating the return type? Like `@outputSchema("timeNowSchema:datetime")` – LiMuBei Jul 17 '15 at 11:34
  • @LiMuBei doesn't work either I'm afraid. Thanks for the suggestion. – undershock Jul 17 '15 at 13:47

0 Answers0