I am using spark 1.2.0 with python.
My problem is that in a sql query if the value of a field is zero , i need to replace it by some other value.
I have tried case /coalese which works for 1.4.0 but not for 1.2.0
case when COALESCE("+fld+",0)=0 then "+str(numavgnumlst[0][lock])+" else "+fld+" end.
However for 1.2.0 i tried to do the same with map
sc = SparkContext(appName="RunModelCCATTR")
sqlContext=SQLContext(sc)
sqlstr="select ..."
nonzerodf=sqlContext.sql(sqlstr)
.....
iifdatadf=nonzerodf.map(lambda candrow:replacezeroforrow(candrow,numavgnumlst))
....
def replacezeroforrow(rowfields,avgvalfields):
ind=0
lent=len(rowfields)
for rowfield in rowfields[4:lent]:
if rowfield==0:
rowfields[ind]=avgvalfields[ind]
ind=ind+1
return rowfields;
this throws error
TypeError: 'Row' object does not support item assignment
Not sure what i can do to achieve the objective in spark 1.2.0.
thanks for the help i think it is working now.. except for the order of the columns seems to have changed .. but that is something that may not be an issue. thanks again
Edit:
The idea helped me a lot ,needed a little modification to solved the immediate problem,-
def replacezeroforrow(rowfields,avgvalfields,dont_replace=[]):
rdict = rowfields.asDict()
return Row(dict([(k,avgvalfields[k] if v == 0 and k not in dont_replace else v ) for (k,v) in rdict.items()]))
I modified the original solution to avoid syntax error at 'for'.
The call to method is as under,-
restrictdict=[FieldSet1,FieldSet2,FieldSet3,FieldSet4,modeldepvarcat[0]]
iifdatadf=nonzerodf.map(lambda candrow: replacezeroforrow(candrow,numavgnumlst[0].asDict(),restrictdict))
However now i am trying to access iifdatadf,
frstln= iifdatadf.first()
print frstln
i am having following error
return "<Row(%s)>" % ", ".join(self)
TypeError: sequence item 0: expected string, dict found
would hugely appreciate help.