0

I have a geopandas dataframe created from a shapefile.

I would like to sort my dataframe according to the column: "name" AND the line chunks should also be sorted by geographic location, such that all nearby chunks which have the same name are grouped together.

How can I do this kind of sorting ?

What I have tried: 1. I calculate the mean coordinate fro each linestring:

df['mean_coord'] = df.geometry.apply(lambda g: [np.mean(g.xy[0]),np.mean(g.xy[1])])
  1. I group the dataframe according to the "name" column and I sort the resulting dataframe according to the mean coordinate:

    grouped=df.sort_values(['mean_coord'],ascending=False).groupby('name')

But I am not sure, if this is the best/most elegant or even correct way to do it. Other than that, I don't know how to get back to a pandas dataframe from the grouped element ?

  • I can't emphasize this enough: you need to hard-code some representative datasets into your questions like i did with my answer here: https://stackoverflow.com/a/47972529/1552748 – Paul H Dec 28 '17 at 19:56
  • (and that does *not* mean linking to some random shapefile) – Paul H Dec 28 '17 at 19:59

1 Answers1

2

First, I'm going show you what I've hard-coded and assumed to be a representative dataset. This is really something you should have provided in the question, but I'm feeling generous this holiday season:

from shapely.geometry import Point, LineString
import geopandas

line1 = LineString([
    Point(0, 0),
    Point(0, 1),
    Point(1, 1),
    Point(1, 2),
    Point(3, 3),
    Point(5, 6),
])

line2 = LineString([
    Point(5, 3),
    Point(5, 5),
    Point(9, 5),
    Point(10, 7),
    Point(11, 8),
    Point(12, 12),
])

line3 = LineString([
    Point(9, 10),
    Point(10, 14),
    Point(11, 12),
    Point(12, 15),
])

gdf = geopandas.GeoDataFrame(
    data={'name': ['A', 'B', 'A']},
    geometry=[line1, line2, line3]
)

So now I'm going to compute the x- and y-coordinates of the centroids of each line, average them, sort by the average and name of the line, the remove the intermediate columns.

output = (
    gdf.assign(x=lambda df: df['geometry'].centroid.x)
       .assign(y=lambda df: df['geometry'].centroid.y)
       .assign(rep_val=lambda df: df[['x', 'y']].mean(axis=1)) 
       .sort_values(by=['name', 'rep_val']) 
       .loc[:, gdf.columns] 
)

print(output)

  name                                       geometry
0    A      LINESTRING (0 0, 0 1, 1 1, 1 2, 3 3, 5 6)
2    A         LINESTRING (9 10, 10 14, 11 12, 12 15)
1    B  LINESTRING (5 3, 5 5, 9 5, 10 7, 11 8, 12 12)
Paul H
  • 65,268
  • 20
  • 159
  • 136