I am trying to write a UDF in python that will be called from a pig script. The UDF needs to accept the date as a string in DD-MMM-YYYY format and return DD-MM-YYYY format. Here MMM will be like JAN, FEB.. DEC and the return MM will be 01, 02... 12.
Below is my python UDF
#!/usr/bin/python
@outputSchema("newdate:chararray")
def GetMonthMM(inputString):
print inputString
#monthstring = inputString[3:6]
sl = slice(3,6)
monthstring = inputString[sl]
monthdigit = ""
if ( monthstring == "JAN" ):
monthdigit = "01"
elif ( monthstring == "FEB"):
monthdigit = "02"
elif(monthstring == "MAR"):
monthdigit = "03"
elif(monthstring == "APR"):
monthdigit = "04"
elif(monthstring == "MAY"):
monthdigit = "05"
elif (monthstring == "JUN"):
monthdigit = "06"
elif (monthstring == "JUL"):
monthdigit = "07"
elif (monthstring == "AUG"):
monthdigit = "08"
elif (monthstring == "SEP"):
monthdigit = "09"
elif (monthstring == "OCT"):
monthdigit = "10"
elif (monthstring == "NOV"):
monthdigit = "11"
elif (monthstring == "DEC"):
monthdigit = "12"
sl1 = slice(0,3)
sl2 = slice(6,11)
str1 = inputString[sl1]
str2 = inputString[sl2]
newdate = str1 + monthdigit + str2
return monthstring;
I did some debugging and the issue seems to be that after the slicing the strings are being treated as arrays. I get the following error message
TypeError: unsupported operand type(s) for +: 'array.array' and 'str'
The same is happening even when the string is being compared to another string like at if (monthstring == "DEC"):. Even when monthstring has DEC as value the condition never satisfies.
Has anybody faced the same issue before? Any ideas how to fix this.