-3

I have a database with, consumption and coordinates, such:

enter image description here

I need to write a code that allows me to find, from local to local, consumptions that are less than average consumption within that radius of 100m. It is intended that the radius of 100m be calculated not once, but for each pair of coordinates.

code:

R9 = []
R9_NaN = []

for index, row in df.iterrows():
    coord_1 = (row['X'], row['Y'])
    for index, row2 in df.iterrows():
        coord_2 = (row['X'], row['Y'])
        if coord_2 != coord_1:
            dist = geopy.distance.geodesic(coord_1, coord_2).km
            if dist <= 0.100:
                média=sum((row['Consumo2018']/12)/(len(coord_2)+1))
                if row['Consumo2018']/12 < 1.5*média:
                    R9_NaN.append(index)
                    R9.append(0)
                else:
                    R9.append(0)

print(R9)

Geopy.distance is a library that already calculates the distance between two coordinates.

In the above code; "média" is assumed to be an average consumption of sites within the 100 m range that should also vary from place to place.

It´s giving me this error:

TypeError: 'float' object is not iterable

  • 2
    First note that if you are going to have 2 for loops you need to change the names of `index, row` because they are overwriting themselves. Also which line do you get the error? – BenT Jun 03 '19 at 15:59
  • in the line that has... média=sum((row['Consumo2018']/12)/(len(coord_2)+1)) – user146110 Jun 03 '19 at 16:02
  • Yes change it to row2 or something like that. You should try printing `len(coord_2)` and `row['Consumo2018']` to see which one is giving you the error. – BenT Jun 03 '19 at 16:12
  • for...print(len(coord_2)) .... 2 – user146110 Jun 03 '19 at 16:15
  • for... print(row['Consumo2018'])... 120 – user146110 Jun 03 '19 at 16:16
  • now i don't have that error... but the print is ....[] – user146110 Jun 03 '19 at 16:19
  • I can't see your data, but your problem is probably related to your nested if statements. Try adding print statements to check if the code does what you expect. If you can't figure it out, try asking another question with more details on that specific problem. – BenT Jun 03 '19 at 16:23
  • Part of the issue is that you are not taking the mean of the dataset. See edit answer below. – BenT Jun 03 '19 at 16:32

1 Answers1

0

So you get a float error because you are trying to take the sum of a single number. This can be fixed by removing the sum in your media = ... line. If you want this line to take the mean of the whole data frame, you need to change the line to get the data from the data frame and not just the row.

R9 = []
R9_NaN = []

for index, row in df.iterrows():
    coord_1 = (row['X'], row['Y'])
    for index, row2 in df.iterrows():
        coord_2 = (row['X'], row['Y'])
        if coord_2 != coord_1:
            dist = geopy.distance.geodesic(coord_1, coord_2).km
            if dist <= 0.100:
#                média=((row['Consumo2018']/12)/(len(coord_2)+1)) #Change to this to remove sum
                média=((df['Consumo2018']/12)/(len(df['Consumo2018'])+1)) #Change to this to get actual mean
                if row['Consumo2018']/12 < 1.5*média:
                    R9_NaN.append(index)
                    R9.append(0)
                else:
                    R9.append(0)

print(R9)
BenT
  • 3,172
  • 3
  • 18
  • 38
  • I'm using "row", in order to sum all those values ... within the radius of 100m – user146110 Jun 03 '19 at 16:32
  • So you you can do that by saving all the data to start with by using `append(row['Consumo2018'])` and then taking the mean following the loop, or you can do it iteratively using `row` but you need to add to the value and not overwrite it in addition to keeping track of the number of elements that have been included. – BenT Jun 03 '19 at 16:46
  • this way i'm using the average of the entire data and not only the average of the consumption in the radius of 100m – user146110 Jun 03 '19 at 16:46
  • You are not using the average of the entire data in your code because you are looping line by line.... `row` is a single row and NOT a column of data from the whole dataframe. – BenT Jun 03 '19 at 16:49