0

I have a set of points and I want to create line / road network from those points. Firstly, I need to determine the closest point from each of the points. For that, I used the KD Tree and developed a code like this:

def closestPoint(source, X = None, Y = None):

df = pd.DataFrame(source).copy(deep = True) #Ensure source is a dataframe, working on a copy to keep the datasource

if(X is None and Y is None):
    raise ValueError ("Please specify coordinate")
elif(not X in df.keys() and not Y in df.keys()):
    raise ValueError ("X and/or Y is/are not in column names")
else:
    df["coord"] = tuple(zip(df[X],df[Y])) #create a coordinate

if (df["coord"].duplicated):
    uniq = df.drop_duplicates("coord")["coord"]
    uniqval = list(uniq.get_values())
    dupl = df[df["coord"].duplicated()]["coord"]
    duplval = list(dupl.get_values())

    for kq,vq in uniq.items():
        clstu = spatial.KDTree(uniqval).query(vq, k = 3)[1]
        df.at[kq,"coord"] = [vq,uniqval[clstu[1]]]
        if([uniqval[clstu[1]],vq] in list(df["coord"]) ):
            df.at[kq,"coord"] = [vq,uniqval[clstu[2]]]

    for kd,vd in dupl.items():
        clstd = spatial.KDTree(duplval).query(vd,k = 1)[1]
        df.at[kd,"coord"] = [vd,duplval[clstd]]
else:
    val = df["coord"].get_values()
    for k,v in df["coord"].items():
        clst = spatial.KDTree(val).query(vd, k = 3)[1]
        df.at[k,"coord"] = [v,val[clst[1]]]
        if([val[clst[1]],v] in list (df["coord"])):
            df.at[k,"coord"] = [v,val[clst[2]]]

return df["coord"]

The code can return the the closest points around. However, I need to ensure that no double lines are created (e.g (x,y) to (x1,y1) and (x1,y1) to (x,y)) and also I need to ensure that each point can only be used as a starting point of a line and an end point of a line despite the point being the closest one to the other points.

Below is the visualization of the result: Result of the code

What I want: What I want

I've also tried to separate the origin and target coordinate and do it like this:

df["coord"] = tuple(zip(df[X],df[Y])) #create a coordinate
df["target"] = "" #create a column for target points

count = 2 # create a count iteration
if (df["coord"].duplicated):
  uniq = df.drop_duplicates("coord")["coord"]
  uniqval = list(uniq.get_values())
  for kq,vq in uniq.items():
    clstu = spatial.KDTree(uniqval).query(vq, k = count)[1]
    while not vq in (list(df["target"]) and list(df["coord"])):
        clstu = spatial.KDTree(uniqval).query(vq, k = count)[1]
        df.set_value(kq, "target", uniqval[clstu[count-1]])
    else:
        count += 1
        clstu = spatial.KDTree(uniqval).query(vq, k = count)[1]
        df.set_value(kq, "target", uniqval[clstu[count-1]])

but this return an error

IndexError: list index out of range

Can anyone help me with this? Many thanks!

botibo
  • 11
  • 5

2 Answers2

0

It is hard to comment on your global strategy without further detail about the kind of road network your want to obtain. So let me just comment your specific code and explain why the "out of range" error happens. I hope this can help.

First, are you aware that (list_a and list_b) will return list_a if it is empty, else list_b? Second, isn't the condition (vq in list(df["coord"]) always True? If yes, then your while loop is just always executing the else statement, and at the last iteration of the for loop, (count-1) will be greater than the total number of (unique) points. Hence your KDTree query does not return enough points and clstu[count-1] is out of range.

Rolvernew
  • 689
  • 6
  • 8
  • thanks for the comment! I was actually able to visualize it and this is the result of the code https://i.stack.imgur.com/CWltU.png, whereas the result that I want is like this i.stack.imgur.com/AxxKN.png. Yes I think I am aware because I want to fill the list a first, and if the point is already existed in list a, it needs to go to list_b (CMIIW). I think you are right, by that I just wanted to say that if the point has been used twice (as a starting point and target point), then another point in the closest distance of the iterated point needs to be appended in target column – botibo Nov 20 '18 at 13:28
  • To be more specific: I think the condition `vq in (list(df["target"]) and list(df["coord"]))` in your code is always True, so your code is probably not doing what you think it is. – Rolvernew Nov 20 '18 at 15:07
  • Thanks for the input! I am trying to get through it but still stuck and still it returns to out of range. Do you have any ideas how to do it? – botibo Nov 20 '18 at 15:54
0

Answering now about the global strategy, here is what I would do (rough pseudo-algorithm):

current_point = one starting point in uniqval
while (uniqval not empty)
  construct KDTree from uniqval and use it for next line
  next_point = point in uniqval closest to current_point
  record next_point as target for current_point
  remove current_point from uniqval
  current_point = next_point

What you will obtain is a linear graph joining all your points, using closest neighbors "in some way". I don't know if it will fit your needs. You would also obtain a linear graph by taking next_point at random...

Rolvernew
  • 689
  • 6
  • 8
  • Hi! Thanks for the solutions! I tried to do it with len(uniqval) != 1 and if not, the last point will be connected to the first point. But I faced another problem, I want to record the target point into the main dataframe (df) with the corresponding index of the current point. But as the current point changes to the closest point, how should I append it? – botibo Nov 21 '18 at 04:06
  • When I write "record next_point as target for current_point", next_point is the target and current_point is still available (with its index). Btw, if my answers help, do not hesitate to upvote or validate them... – Rolvernew Nov 21 '18 at 08:03
  • Ah I see what you mean, thanks for the answer! It is what I am looking for and will absolutely upvote it – botibo Nov 21 '18 at 10:27