0

I have a shapefile that I am bringing in with geopandas. I do not need all the rows, just certain rows that start with 01, 02, and 03 for the route_id. I use a for loop and an if statement to try to append the data I want to an empty dataframe. Below is the sample Data:

ROUTE_ID FROM_MEASU TO_MEASURE STREET_PRE BASE_NAME
0100006595050034-D 5.799725 9.678965 215th
0200006595050034-D 0 9.678965 ST 220th
0300006595050034-D 5.799725 9.678965 215th
0400006595050034-D 0 9.678965 ST 220th

my code is as follows:

mnlrshwy = pd.DataFrame(columns=['ROUTE_ID','FROM_MEASU','TO_MEASURE','STREET_PRE',
                                 'BASE_NAME'])
for x in mnlrs['ROUTE_ID']:
    if x.startswith(('01','02','03')) is True:
        mnlrshwy = x.append(mnlrs,ignore_index = True)

I get a concatenation error which I don't understand why I would get something like that. Any suggestions would be helpful.

joshuah9
  • 79
  • 1
  • 7
  • I think it does, but i get: ValueError: Cannot mask with non-boolean array containing NA / NaN values. So, there must be blank cells in that column somewhere. – joshuah9 Jun 02 '22 at 16:55
  • I used this code to solve this issue: mnlrs=mnlrs[mnlrs['ROUTE_ID'].notna()] – joshuah9 Jun 02 '22 at 17:02

3 Answers3

1

If you use pandas, you won't need a for loop.

df[df['ROUTE_ID'].str.contains("^0(1|2|3)", regex=True, na=False)]
Amir Py
  • 121
  • 1
  • 4
  • Much simpler solution, +1. Also worth noting, though, could also use `Series.str.startswith()` instead of `contains()` to make the code a bit more self-documenting – G. Anderson Jun 02 '22 at 18:20
  • I ended up using this: routes = ['01','02','03'] mnlrshwy=mnlrs[mnlrs.ROUTE_ID.str.startswith(tuple(routes))] – joshuah9 Jun 03 '22 at 13:01
0

perform type check at each stage, or convert it into string then append it to your data frame.

0

df Output

             ROUTE_ID  FROM_MEASU  TO_MEASURE STREET_PRE BASE_NAME
0  0100006595050034-D    5.799725    9.678965        NaN     215th
1  0200006595050034-D    0.000000    9.678965         ST     220th
2  0300006595050034-D    5.799725    9.678965        NaN     215th
3  0400006595050034-D    0.000000    9.678965         ST     220th

Above is the data of your dataframe.

mnlrshwy = pd.DataFrame(columns=['ROUTE_ID', 'FROM_MEASU', 'TO_MEASURE', 'STREET_PRE',
                                 'BASE_NAME'], index=[0, 1, 2, 3])
for x in range(0, len(df['ROUTE_ID'])):
    if df.loc[x, 'ROUTE_ID'].startswith(('01', '02', '03')) is True:
        mnlrshwy.loc[x, :] = df.loc[x, 'ROUTE_ID']

print(mnlrshwy)

Output

             ROUTE_ID          FROM_MEASU          TO_MEASURE  \
0  0100006595050034-D  0100006595050034-D  0100006595050034-D   
1  0200006595050034-D  0200006595050034-D  0200006595050034-D   
2  0300006595050034-D  0300006595050034-D  0300006595050034-D   
3                 NaN                 NaN                 NaN   

           STREET_PRE           BASE_NAME  
0  0100006595050034-D  0100006595050034-D  
1  0200006595050034-D  0200006595050034-D  
2  0300006595050034-D  0300006595050034-D  
3                 NaN                 NaN  

You get one value at each iteration. And if there are no indexes in an empty dataframe, then it will not be possible to assign a value. Unless you add a value in square brackets. Here you can see about an empty dataframe

I filled in on each iteration the rows of all columns. Using loc on the left is the indexes, on the right is the name of the column.

inquirer
  • 4,286
  • 2
  • 9
  • 16