0

I have this dataframe

d = {
    'geoid': ['13085970205'],
    'FIPS': ['13085'],
    'Year': [2024],
    'parameters': [{"Year": 2024, "hpi_prediction": 304.32205}],
    'geometry':[
        {
            "coordinates": [[[[-84.126456, 34.389734], [-84.12641, 34.39026], [-84.126323, 34.39068]]]],
            "parameters": {"Year": 2024, "hpi_prediction": 304.32205},
            "type": "MultiPolygon"
        }
    ]
    
}

dd = pd.DataFrame(data=d)

When I want to write this out I use import geopandas as gpd to convert the data into a dataframe like this

df_geopandas_hpi = gpd.GeoDataFrame(dd[['geoid', 'geometry']])

Once this happens the parameters key in the original dataframe gets erased. Why? Note that the type of geometry in example dataframe is geojson.geometry.MultiPolygon. How can I avoid this from happening?

What I essentially need to do is the following

if ~os.path.exists('../verus_data'):
    os.mkdir('../verus_data')

for county, df_county in dd.groupby('FIPS'):
    if ~os.path.exists('../verus_data/'+str(county)):
        os.mkdir('../verus_data/'+str(county))

    if ~os.path.exists('../verus_data/'+str(county)+'/'+'predicted'):
        os.mkdir('../verus_data/'+str(county)+'/'+'predicted')

    if ~os.path.exists('../verus_data/'+str(county)+'/'+'analyzed'):
        os.mkdir('../verus_data/'+str(county)+'/'+'analyzed')    

    df_hpi = df_county[df_county['key'] == 'hpi']
    df_analyzed = df_county[df_county['key'] == 'analyzed']

    for year, df_year in df_hpi.groupby('Year'):
        if ~os.path.exists('../verus_data/'+str(county)+'/'+'predicted'+'/'+str(year)):
            os.mkdir('../verus_data/'+str(county)+'/'+'predicted'+'/'+str(year))

            df_geopandas_hpi = gpd.GeoDataFrame(df_year[['geoid', 'geometry', 'parameters']])
            df_geopandas_hpi.to_file('../verus_data/'+str(county)+'/'+'predicted'+'/'+str(year)+'/'+'hpi_predictions.geojson', driver="GeoJSON")

    for year, df_year in df_analyzed.groupby('Year'):
        if ~os.path.exists('../verus_data/'+str(county)+'/'+'analyzed'+'/'+str(year)):
            os.mkdir('../verus_data/'+str(county)+'/'+'analyzed'+'/'+str(year))

            df_geopandas_analyzed = gpd.GeoDataFrame(df_year[['geoid', 'geometry', 'parameters']])
            df_geopandas_analyzed.to_file('../verus_data/'+str(county)+'/'+'analyzed'+'/'+str(year)+'/'+'analyzed_values.geojson', driver="GeoJSON")

I need to somehow write out these geojson files while keeping parameters key intact.

Wolfy
  • 548
  • 2
  • 9
  • 29
  • what parameters key? you don't have anything in your example code called `parameters` – Michael Delgado Nov 28 '22 at 04:53
  • The "parameters" are inside the geojson object in the geometry column – Wolfy Nov 28 '22 at 04:54
  • oh wow - it's burried way deep in the shape. editied to clarify. also, I imagine this MRE would still work with a much simpler shape, like a triangle? – Michael Delgado Nov 28 '22 at 04:56
  • What is MRE? I am not sure what you mean – Wolfy Nov 28 '22 at 04:59
  • sorry - [mre] - your example code is unnecessarily long if the key issue here is the parameters - you could cut down your example further by simplifying the shape, e.g. by dropping all but the first three points – Michael Delgado Nov 28 '22 at 05:03
  • Good point, I just kept it for consistency – Wolfy Nov 28 '22 at 05:04
  • 1
    consistency with your own workflow should not take precedence over reproducibility and clarity in the example. make sure your example is *minimal*, *complete*, and *reproducible* - all of these are important. It took a lot of iteration to get to the bottom of this question because you had excluded really important information, and you had also originally buried other important details in a mountain of unnecessary data. Try to focus on the points in the [mre] guide when asking again. Glad you got to the bottom of it though! – Michael Delgado Nov 28 '22 at 18:41
  • Thanks for the pointers, I am still getting the hang of things – Wolfy Nov 28 '22 at 20:29
  • 1
    no problem - there's definitely a learning curve. thanks for following through on them :) – Michael Delgado Nov 28 '22 at 21:52

2 Answers2

1

Geopandas relies on the shapely library to handle geometry objects. Shapely does not have a concept of parameters or additional metadata which can be included at arbitrary levels in GeoJSON but don't fit the shapely or geopandas data models.

For example, when parsing with shapely.geometry.shape:

In [10]: shape = shapely.geometry.shape(
    ...:         {
    ...:             "coordinates": [[[[-84.126456, 34.389734], [-84.12641, 34.39026], [-84.126323, 34.39068]]]],
    ...:             "parameters": {"Year": 2024, "hpi_prediction": 304.32205},
    ...:             "type": "MultiPolygon"
    ...:         }
    ...:     )

In [11]: shape
Out[11]: <shapely.geometry.multipolygon.MultiPolygon at 0x11040eb60>

In [12]: shape.parameters
---------------------------------------------------------------------------
AttributeError                            Traceback (most recent call last)
Input In [12], in <cell line: 1>()
----> 1 shape.parameters

AttributeError: 'MultiPolygon' object has no attribute 'parameters'

If you'd like to retain these, you'll need to parse the json separately from converting to geopandas. For example, if "parameters" is present in every element, you could simply assign it as a new column:


In [21]: gdf = gpd.GeoDataFrame(dd[["geoid", "geometry"]])
    ...: gdf["parameters"] = dd.geometry.str["parameters"]

In [22]: gdf
Out[22]:
         geoid                                           geometry                                   parameters
0  13085970205  {'coordinates': [[[[-84.126456, 34.389734], [-...  {'Year': 2024, 'hpi_prediction': 304.32205}

However, if the parameters field is not always present, you may need to do some extra cleaning. You can always access the elements of the geometry column within the pandas dataframe dd directly, e.g.

In [27]: dd.loc[0, "geometry"]["parameters"]["hpi_prediction"]
Out[27]: 304.32205
Michael Delgado
  • 13,789
  • 3
  • 29
  • 54
  • When you say parse the json separetely does that mean I should just write out json files instead of geojson? Let me add more context. – Wolfy Nov 28 '22 at 05:11
  • no I mean directly working with the list of dictionaries in the geojson or the Series of dictionaries in the pandas dataframe. I updated my answer with an example. – Michael Delgado Nov 28 '22 at 06:30
0

All you have to do is add the parameters column in the

df_geopandas_hpi = gpd.GeoDataFrame(df_year[['geoid', 'geometry', 'parameters']])
Wolfy
  • 548
  • 2
  • 9
  • 29
  • this isn't true. there is no parameters column in the pandas dataframe; instead, it is nested in the "geometry" object column. – Michael Delgado Nov 28 '22 at 06:21
  • I should of phrased the question that the parameters has a column in the dataframe. My apologies. – Wolfy Nov 28 '22 at 18:21
  • 1
    oh! sorry I missed the fact that you were the original asker. got it yeah if you have already added this to the dataframe then the solution really is as simple as you suggest. with the edits to your question it's a totally different approach :) – Michael Delgado Nov 28 '22 at 18:37