for loop with if statement to go through dataframe to check of column values startswith a certain value

Question

I have a shapefile that I am bringing in with geopandas. I do not need all the rows, just certain rows that start with 01, 02, and 03 for the route_id. I use a for loop and an if statement to try to append the data I want to an empty dataframe. Below is the sample Data:

ROUTE_ID	FROM_MEASU	TO_MEASURE	STREET_PRE	BASE_NAME
0100006595050034-D	5.799725	9.678965		215th
0200006595050034-D	0	9.678965	ST	220th
0300006595050034-D	5.799725	9.678965		215th
0400006595050034-D	0	9.678965	ST	220th

my code is as follows:

mnlrshwy = pd.DataFrame(columns=['ROUTE_ID','FROM_MEASU','TO_MEASURE','STREET_PRE',
                                 'BASE_NAME'])
for x in mnlrs['ROUTE_ID']:
    if x.startswith(('01','02','03')) is True:
        mnlrshwy = x.append(mnlrs,ignore_index = True)

I get a concatenation error which I don't understand why I would get something like that. Any suggestions would be helpful.

I think it does, but i get: ValueError: Cannot mask with non-boolean array containing NA / NaN values. So, there must be blank cells in that column somewhere. — joshuah9, Jun 02 '22 at 16:55
I used this code to solve this issue: mnlrs=mnlrs[mnlrs['ROUTE_ID'].notna()] — joshuah9, Jun 02 '22 at 17:02

score 1 · Answer 1 · answered Jun 02 '22 at 16:57

1

If you use pandas, you won't need a for loop.

df[df['ROUTE_ID'].str.contains("^0(1|2|3)", regex=True, na=False)]

answered Jun 02 '22 at 16:57

Amir Py

121
1
4

Much simpler solution, +1. Also worth noting, though, could also use `Series.str.startswith()` instead of `contains()` to make the code a bit more self-documenting – G. Anderson Jun 02 '22 at 18:20
I ended up using this: routes = ['01','02','03'] mnlrshwy=mnlrs[mnlrs.ROUTE_ID.str.startswith(tuple(routes))] – joshuah9 Jun 03 '22 at 13:01

score 0 · Answer 2 · answered Jun 02 '22 at 16:44

0

perform type check at each stage, or convert it into string then append it to your data frame.

answered Jun 02 '22 at 16:44

Shivam Gadekar

3
2

inquirer · Answer 3 · 2022-06-02T17:09:21.050

df Output

             ROUTE_ID  FROM_MEASU  TO_MEASURE STREET_PRE BASE_NAME
0  0100006595050034-D    5.799725    9.678965        NaN     215th
1  0200006595050034-D    0.000000    9.678965         ST     220th
2  0300006595050034-D    5.799725    9.678965        NaN     215th
3  0400006595050034-D    0.000000    9.678965         ST     220th

Above is the data of your dataframe.

mnlrshwy = pd.DataFrame(columns=['ROUTE_ID', 'FROM_MEASU', 'TO_MEASURE', 'STREET_PRE',
                                 'BASE_NAME'], index=[0, 1, 2, 3])
for x in range(0, len(df['ROUTE_ID'])):
    if df.loc[x, 'ROUTE_ID'].startswith(('01', '02', '03')) is True:
        mnlrshwy.loc[x, :] = df.loc[x, 'ROUTE_ID']

print(mnlrshwy)

Output

             ROUTE_ID          FROM_MEASU          TO_MEASURE  \
0  0100006595050034-D  0100006595050034-D  0100006595050034-D   
1  0200006595050034-D  0200006595050034-D  0200006595050034-D   
2  0300006595050034-D  0300006595050034-D  0300006595050034-D   
3                 NaN                 NaN                 NaN   

           STREET_PRE           BASE_NAME  
0  0100006595050034-D  0100006595050034-D  
1  0200006595050034-D  0200006595050034-D  
2  0300006595050034-D  0300006595050034-D  
3                 NaN                 NaN

You get one value at each iteration. And if there are no indexes in an empty dataframe, then it will not be possible to assign a value. Unless you add a value in square brackets. Here you can see about an empty dataframe

I filled in on each iteration the rows of all columns. Using loc on the left is the indexes, on the right is the name of the column.

for loop with if statement to go through dataframe to check of column values startswith a certain value

3 Answers3