-1

I'm working on a spark code, I always got error:

TypeError: 'float' object is not iterable

on the line of reduceByKey() function. Can someone help me? This is the stacktrace of the error:

d[k] = comb(d[k], v) if k in d else creator(v)
  File "/home/hw/SC/SC_spark.py", line 535, in <lambda>
TypeError: 'float' object is not iterable

Here is code:

def field_valid(m):
    dis=m[1]
    TxP=m[2]
    ef=m[3]
    pl=m[4]
    if TxP != 'NaN' and disl != 'NaN' and ef !='NaN' and pl != 'NaN':
        return True
    else:
        return False

def parse_input(d):
    #d=data.split(',')

    s_name='S'+d[6] # serving cell name

    if d[2] =='NaN' or d[2] == '':
        ef='NaN'
    else:
        ef=float(d[2].strip().rstrip())

    if d[7] =='NaN' or d[7] == '' or d[7] == '0':
        TxP='NaN'
    else:
        TxP=float(d[7].strip().rstrip())

    if d[9] =='NaN' or d[9] == '':
        dis='NaN'
    else:
        dis=float(d[9].strip().rstrip())

    if d[10] =='NaN' or d[10] == '':
        pl='NaN'
    else:
        pl=float(d[10].strip().rstrip())

return s_name,dis, TxP, ef, pl


sc=SparkContext(appName="SC_spark")
lines=sc.textFile(ip_file)
lines=lines.map(lambda m: (m.split(",")))
lines=lines.filter(lambda m: (m[6] != 'cell_name'))
my_rdd=lines.map(parse_input).filter(lambda m: (field_valid(m)==True))
my_rdd=my_rdd.map(lambda x: (x[0],(x[1],x[2])))                                                                                                                                          
my_rdd=my_rdd.reduceByKey(lambda x,y:(max(x[0],y[0]),sum(x[1],y[1])))  #this line got error

Here is some sample data:


Class,PB,EF,RP,RQ,ID,cell_name,TxP,BW,DIS,PL,geom
NaN,10,5110,-78.0,-7.0,134381669,S417|134381669|5110,62.78151250383644,10,2578.5795095469166,113.0,NaN
NaN,10,5110,-71.0,-6.599999904632568,134381669,S417|134381669|5110,62.78151250383644,10,2689.630258510342,106.0,NaN
NaN,10,5110,-77.0,-7.300000190734863,134381669,S417|134381669|5110,62.78151250383644,10,2907.8184899249713,112.0,19.299999999999983
NaN,10,5110,-91.0,-11.0,134381669,S417|134381669|5110,62.78151250383644,10,2779.96762695867,126.0,5.799999999999997
NaN,10,5110,-90.0,-12.69999980926514,134381669,S417|134381669|5110,62.78151250383644,10,2749.8351648579583,125.0,9.599999999999994
NaN,10,5110,-95.0,-13.80000019073486,134381669,S417|134381669|5110,62.78151250383644,10,2942.7938902934643,130.0,-2.4000000000000057
NaN,10,5110,-70.0,-7.099999904632568,134381669,S417|134381669|5110,62.78151250383644,10,3151.930706017461,105.0,22.69999999999999
Helen Z
  • 21
  • 1
  • 8
  • 1
    I am not familiar with `pyspark`, but in the line where the error occurs you call `sum` with two arguments. Unless the first one is an iterable and the second an int, your error is probably there. Try calling `sum(1.0, 2)` on a python console. It gives me a very similar error. – bla Apr 22 '18 at 06:07
  • Hi @bla, I just tested out, made sure all fields are converted to float. You noticed I filtered the line with NaN on those values, so, the number is float only. I also checked the syntax of lambda function, I separate to (k,v). I didn't find anything wrong. Did you find anything wrong? – Helen Z Apr 22 '18 at 06:14
  • What exactly is `m.split(",")` doing? You have no commas in the data – OneCricketeer Apr 22 '18 at 06:15
  • @HelenZ you cannot pass a float as the first argument of `sum`. It expects an interable. Check it out: https://docs.python.org/3.5/library/functions.html#sum. I cannot confirm that this is the case, since I am not sure `x[1]` is a float. But the stacktrace are very similar. – bla Apr 22 '18 at 06:19
  • Hi, @cricket_007, .split(",") splits the lines by comma. I copied the data from csv file. Let me edit it to notepad format. – Helen Z Apr 22 '18 at 06:19
  • Your error is that `sum()` is a built in function, you're not using a Spark function there, which would also explain why `sum(1.0, 2)` fails by itself as the sum function requires an iterable, and your `x[1]` is a single value. Try `x[1] + y[1]` if you are trying to sum your RDD column... Alternatively, I suggest using SparkSQL sum functions – OneCricketeer Apr 22 '18 at 06:19
  • @cricket_007, what do you mean x[1] is a single value? – Helen Z Apr 22 '18 at 06:22
  • @HelenZ `x[1]` is a float and therefore not an iterable (like lists or sets). Since `sum` expects an iterable it raises an error when a non iterable value is passed. – bla Apr 22 '18 at 06:25
  • As part of `lambda x,y`, the `x` value is not a list, tuple, or other collection... As the error says, it is a single floating point number... Is there a specific reason you're using RDDs or Spark1 functions instead of Spark2 with its built-in CSV reader? Also, what is expected output here? – OneCricketeer Apr 22 '18 at 06:25
  • 1
    Hi @cricket_007. BTW i just changed to x[1]+y[1], and it works!! I'm new to spark, and can't distinguish spark1 and spark2 yet. can you tell me how to do in spark2? the expected result is sum and max of value 'dis' by the same key, and key is column 'cell_name'. – Helen Z Apr 22 '18 at 06:32
  • Hi, @bla thank you. – Helen Z Apr 22 '18 at 06:34
  • You are welcome. :) – bla Apr 22 '18 at 06:36
  • Hi @cricket_007, just realized our server uses spark-1.4. But thank you for great help. – Helen Z Apr 22 '18 at 06:46
  • You can run Spark2 code against the same YARN or Mesos cluster as Spark1. Not sure about a standalone scheduler – OneCricketeer Apr 22 '18 at 06:48

1 Answers1

0

the expected result is sum and max of value

In that case, you are looking for x[1] + y[1], and not use the built-in sum() function.

my_rdd.reduceByKey( lambda x,y: ( max(x[0],y[0]), x[1] + y[1] ) )
OneCricketeer
  • 179,855
  • 19
  • 132
  • 245
  • Hi @cricket_700, can I ask another question? Now I want to save result into a .txt file, but I want to add a header to the .txt file, how should I do it? I used this statement: my_rdd.repartition(1).saveAsTextFile("sc_result/result.txt") – Helen Z Apr 22 '18 at 06:58
  • You need to union your RDD with a header RDD. https://stackoverflow.com/questions/26157456/add-a-header-before-text-file-on-save-in-spark – OneCricketeer Apr 22 '18 at 07:01