5

Problem

I recently found a need to determine if my points are inside of a polygon. So I learned about this approach in C++ and adapted it to python. However, the C++ code I was studying isn't quite right I think? I believe I have fixed it, but I am not quite sure so I was hoping folks brighter than me might help me caste some light on this?

The theorem is super simple and the idea is like this, given an nth closed polygon you draw an arbitrary line, if your point is inside, you line will intersect with the edges an odd number of times. Otherwise, you will be even and it is outside the polygon. Pretty freaking cool.

I had the following test cases:

    polygon_x = [5, 5, 11, 10]
    polygon_y = [5, 10, 5, 10]
    test1_x = 6
    test1_y = 6

    result1 = point_in_polygon(test1_x, test1_y, polygon_x, polygon_y)
    print(result1)

    test2_x = 13
    test2_y = 5
    result2 = point_in_polygon(test2_x, test2_y, polygon_x, polygon_y)
    print(result2)

The above would give me both false if I defined it as follows:

            if polygon_x[i] < polygon_x[(i+1) % length]:
                temp_x = polygon_x[i]
                temp_y = polygon_x[(i+1) % length]
            else:
                temp_x = polygon_x[(i+1) % length]
                temp_y = polygon_x[i]

This is wrong! I should be getting true for result1 and then false for result2. So clearly, something is funky.

The code I was reading in C++ makes sense except for the above. In addition, it failed my test case which made me think that temp_y should be defined with polygon_y and not polygon_x. Sure enough, when I did this, my test case for (6,6) passes. It still fails when my points are on the line, but as long as I am inside the polygon, it will pass. Expected behavior.

Polygon code adopted to python

    def point_in_polygon(self, target_x, target_y, polygon_x, polygon_y):
        print(polygon_x)
        print(polygon_y)
        #Variable to track how many times ray crosses a line segment
        crossings = 0
        temp_x = 0
        temp_y = 0
        length = len(polygon_x)

        for i in range(0,length):

            if polygon_x[i] < polygon_x[(i+1) % length]:
                temp_x = polygon_x[i]
                temp_y = polygon_y[(i+1) % length]
            else:
                temp_x = polygon_x[(i+1) % length]
                temp_y = polygon_y[i]

            print(str(temp_x) + ", " + str(temp_y))

            #check
            if target_x > temp_x and target_x <= temp_y and (target_y < polygon_y[i] or target_y <= polygon_y[(i+1)%length]):
                eps = 0.000001

                dx = polygon_x[(i+1) % length] - polygon_x[i]
                dy = polygon_y[(i+1) % length] - polygon_y[i]

                k = 0
                if abs(dx) < eps:
                    k = 999999999999999999999999999
                else:
                    k = dy/dx

                m = polygon_y[i] - k * polygon_x[i]
                y2 = k*target_x + m

                if target_y <= y2:
                    crossings += 1
        print(crossings)
        if crossings % 2 == 1:
            return True
        else:
            return False

Summary

Can someone please explain to me what the temp_x and temp_y approaches are doing? Also, if my fix for redefining the temp_x for polygon_x and temp_y for polygon_y is the correct approach? I doubt it. Here is why.

What is going on for temp_x and temp_y doesn't quite make sense to me. For i = 0, clearly polygon_x[0] < polygon_x[1] is false, so we get temp_x[1] = 5 and temp_y[0] = 5. That is (5,5). This just happens to be one of my pairs. However, suppose I feed the algorithm my points out of order (by axis, pairwise integrity is always a must), something like:

x = [5, 10, 10, 5]
y = [10,10, 5, 5]

In this case, when i = 0, we get temp_x[1] = 10 and temp_y[0] = 10. Okay, by coincidence (10,10). I also tested points against the "corrected" algorithm (9,9) and it is still inside. In short, I am trying to find a counterexample, for why my fix won't work, but I can't. If this is working, I need to understand what the method is doing and hope someone could help explain it to me?

Regardless, if I am right or wrong, I would appreciate it if someone could help shed some better light on this problem. I'm even open to solving the problem in a more efficient way for n-polygons, but I want to make sure I am understanding code correctly. As a coder, I am uncomfortable with a method that doesn't quite make sense.

Thank you so much for listening to my thoughts above. Any suggestions greatly welcomed.

Julien
  • 13,986
  • 5
  • 29
  • 53
hlyates
  • 1,279
  • 3
  • 22
  • 44
  • 1
    Doesn't address the core of your question, but `matplotlib` has a method [`contains_points`](http://matplotlib.org/1.2.1/api/path_api.html#matplotlib.path.Path.contains_points) for this purpose. It works with arbitrary path objects (polygons included). http://stackoverflow.com/questions/8833950/how-to-determine-which-points-are-inside-of-a-polygon-and-which-are-not-large-n – lanery Sep 01 '16 at 04:56

1 Answers1

3

You've misunderstood what the x1 and x2 values in the linked C++ code are for, and that misunderstanding has caused you to pick inappropriate variable names in Python. Both of the variables contain x values, so temp_y is a very misleading name. Better names might be min_x and max_x, since they are the minimum and maximum of the x values of the vertices that make up the polygon edge. A clearer version of the code might be written as:

for i in range(length):
    min_x = min(polygon_x[i], polygon_x[(i+1)%length])
    max_x = max(polygon_x[i], polygon_x[(i+1)%length])

    if x_min < target_x <= x_max:
        # ...

This is perhaps a little less efficient than the C++ style of code, since calling both min and max will compare the values twice.

Your comment about the order of the points suggests that there's a further misunderstanding going on, which might explain the unexpected results you were seeing when setting temp_y to a value from polygon_x. The order of the coordinates in the polygon lists is important, as the edges go from one coordinate pair to the next around the list (with the last pair of coordinates connecting to the first pair to close the polygon). If you reorder them, the edges of the polygon will get switched around.

The example coordinates you give in your code (polygon_x = [5, 5, 11, 10] and polygon_y = [5, 10, 5, 10]) don't describe a normal kind of polygon. Rather, you get a (slightly lopsided) bow-tie shape, with two diagonal edges crossing each other like an X in the middle. That's not actually a problem for this algorithm though.

However, the first point you're testing lies exactly on one of those diagonal edges (the one that wraps around the list, from the last vertex, (10, 10) to the first, (5, 5)). Whether the code will decide it's inside or outside of the polygon likely comes down to floating point rounding and the choice of operator between < or <=. Either answer could be considered "correct" in this situation.

When you reordered the coordinates later in the question (and incidentally change an 11 to a 10), you turned the bow-tie into a square. Now the (6, 6) test is fully inside the shape, and so the code will work if you don't assign a y coordinate to the second temp variable.

Blckknght
  • 100,903
  • 11
  • 120
  • 169
  • Wow, this an amazing response! :) I will test things out tonight to verify, but I would say this is a huge candidate for the official answer. Thanks. I do have one important follow up question. – hlyates Sep 01 '16 at 12:55
  • I thought the order of polygon_x and polygon_y did not matter? Eventually, I want to feed my algorithm small square or polygon rectangular shapes that I parsed from a KML file. How can I ensure I don't screw up my shapes? Perhaps with the idea of creating edges? So polygon_x = [5, 10, 10, 5] and polygon_y = [5, 5, 10, 10] also works? I just have to be sure my coordinates in KML ordering make sense. Right? – hlyates Sep 01 '16 at 13:02
  • 1
    I don't know anything about KML files, but I can't imagine they store polygon vertices in an arbitrary order, since that would be ambiguous. It's more likely they consistently order the vertices in either a clockwise or counterclockwise order. For this algorithm it doesn't matter which! – Blckknght Sep 01 '16 at 20:02
  • I tested out the changes and this seems to be working. I may use an off the shelf solution in a python library if I can ever find one, but it's nice to understand better how this works. Do you have any reading or online references you are aware of that can help me understand the line generation code better? I kind of want to tear my teeth into this Jordan Curve approach. Super clever it is. – hlyates Sep 02 '16 at 01:35