0

I am attempting to run a nearest neighbor sort in python. I have a dataFrame full of points, example:

       x      y
1     10   10.0
2     26   11.0
3     27   20.0
4     36   19.0
...

up to 1000 points. I am trying to sort these points by shortest distance to any unused point in the dataFrame. The code I am currently using to do this sort is shown below.

for j in range(0, len(data)-2):
    minDist = 1000000
    k = j+1
    for i in range(k, len(data)-1):
        #dist1 = distance.euclidean(j, i+1)
        dist2 = distance.euclidean(j, i)

        if(dist2<minDist):
            minDist = dist2
            print(minDist)
            minI = data.iloc[i]

    b, c = data.iloc[j+1].copy(), data.iloc[i].copy()
    data.iloc[j+1],data.iloc[i] = c, b

However, when I run this code, my output data file only moves one data point, and it's not the correct data point, as shown here:

         x      y
1     10.0   10.0
2    624.0  436.0
3     26.0   11.0
4     27.0   20.0

I believe it is some problem with the nested for loops, however I am not sure. Are there any errors with my for loops? Or is it just a problem with how I'm approaching the problem in Python?

martineau
  • 119,623
  • 25
  • 170
  • 301
Aaron Sexton
  • 11
  • 1
  • 5
  • I suggest you learn how to debug your code. Read https://ericlippert.com/2014/03/05/how-to-debug-small-programs/ for some great tips to get started. – Code-Apprentice Feb 13 '18 at 20:00
  • Somewhat related: https://stackoverflow.com/questions/48174398/new-dataframe-column-as-a-generic-function-of-other-rows-pandas – pault Feb 13 '18 at 20:05
  • 1
    That code should raise `IndentationError`. Please fix the indentation so we know what your code really looks like. – PM 2Ring Feb 13 '18 at 20:11
  • Indentation is wrong! – smci Feb 13 '18 at 20:13
  • Indentation was wrong after copied over, apoligies. It's been fixed. – Aaron Sexton Feb 13 '18 at 20:31
  • The arguments to `distance.euclidean` need to be arrays. `i`, and `j` are just numeric indexes. – Barmar Feb 13 '18 at 20:44
  • I guess it should be `distance.euclidean(data[j], data[i])` – Barmar Feb 13 '18 at 20:45
  • @Barmar are you sure? it will return a numerical answer that i've checked for correctness. EDIT: I see what you mean. Let me try this. You're right EDIT2: returns the same as shown in the original question. – Aaron Sexton Feb 13 '18 at 20:55
  • [documentation](https://docs.scipy.org/doc/scipy-0.14.0/reference/generated/scipy.spatial.distance.euclidean.html): **Computes the Euclidean distance between two 1-D arrays.** – Barmar Feb 13 '18 at 20:59
  • @Barmar it says to pass a 1-D array, a vector, which is what each row of my matrices, represented by data.iloc[j] and data.iloc[i], are. And it does return the correct distances. It's the loop that it seems to have trouble running. – Aaron Sexton Feb 13 '18 at 21:14
  • You write `distance.euclidean(j, i)`. `j` and `i` are not arrays, they come from the `range()` iteration. I don't see how that could possibly work. – Barmar Feb 13 '18 at 21:17
  • It should be `distance.euclidean(data.iloc[i], data.iloc[j])` – Barmar Feb 13 '18 at 21:18
  • because i and j are referencing data in an array. it works. the problem actually arose from the swap not the loops, so my question was misguided. I can post the full code if you'd like to run it yourself @Barmar – Aaron Sexton Feb 15 '18 at 15:41
  • You should post the solution as an Answer below. – Barmar Feb 15 '18 at 21:28
  • solution posted @Barmar – Aaron Sexton Feb 19 '18 at 22:42

2 Answers2

1

If you are trying to nest the for loops, you are doing it wrong, as the indentation used with the first for loop is incorrect. To nest them you would have to do something like this:

for j in range(0, len(data)-2):
    minDist = 1000000
    k = j+1
    for i in range(k, len(data)-1):
        #dist1 = distance.euclidean(j, i+1)
        dist2 = distance.euclidean(j, i)

        if(dist2<minDist):
            minDist = dist2
            print(minDist)
            minI = data.iloc[i]

    b, c = data.iloc[j+1].copy(), data.iloc[i].copy()
    data.iloc[j+1],data.iloc[i] = c, b
J Lem
  • 47
  • 9
1

Solution to the issue:

Second loop was not iterating with respect to the first loop, so the "k=j+1" line was added.

Also added the minDist = 10000000, to ensure that the first comparison was correct and didn't skip an initial point.

for j in range(0, len(data)-1):
minDist = 1000000
k = j+1
for i in range(k, len(data)):
    #dist1 = distance.euclidean(j, i+1)
    dist2 = distance.euclidean(data.iloc[j], data.iloc[i])

    if(dist2<minDist):
        minDist = dist2
        #print(minDist)
        minI = i

b, c = data.iloc[j+1].copy(), data.iloc[minI].copy()
data.iloc[j+1],data.iloc[minI] = c, b
Aaron Sexton
  • 11
  • 1
  • 5