pandas dataframe - does filtering / selecting cols by String preserve order?

Question

I have a use case where I have say 10 cols out of which 5 start with the string 'Region'. I need to get a resulting dataframe which only contains those cols (starting with string 'Region'). Not only that, I need to make sure the order is preserved (e.g. if in original df, the col order is 'Region 1', 'Region 2', 'Region 3' -- this should be preserved and not result in 'Region 3', 'Region 2', 'Region 1' instead).

Would following the 'accepted answer' for this question preserve the order or is there some other method to achieve that?

stackoverflow - find-column-whose-name-contains-a-specific-string

what is in the remaining columns, are they all alpha or alphanumeric? — Umar.H, May 28 '20 at 14:37

score 2 · Answer 1 · answered May 28 '20 at 14:40

Yes, it will. df.columns is a list, when you iterate over list, you preserve the order of the list. Thus, you can use the answer from the mentioned link:

region_cols = [col for col in df.columns if 'Region' in col]

df[region_cols] - will be the df you require.

BENY · Accepted Answer · 2020-05-28T14:58:24.430

2

Two steps first use filter

s=df.filter(like='Region')

edited May 28 '20 at 14:58

answered May 28 '20 at 14:41

BENY

317,841
20
164
234

I am not sure that this is the answer author requires – Artyom Akselrod May 28 '20 at 14:43
1

@ArtyomAkselrod this does exactly the same thing as your answer. Maybe replace `like` with `regext='^Region'` is better reflect *startswith*. – Quang Hoang May 28 '20 at 14:48
Thanks @Quang Hoang – Ali Khan May 28 '20 at 19:53
@QuangHoang this was a comment to first unedited version and it was incorrect. Now the aswer is correct – Artyom Akselrod May 29 '20 at 07:31

score 1 · Answer 3 · answered May 28 '20 at 15:07

if your data frame is similar to :

print(df)


   Region 3  Region 2  Region 1  Custom  UnwantedCol
0         0         0         0       0            0

we can use the sorted method to sort your columns by the number:

nat_cols_sort = dict(sorted(
    {col: int(col.split(" ")[1]) for col in df.filter(regex='^Region').columns}.items(),
    key=lambda x: x[1],
))


print(df[nat_sort.keys()])

   Region 1  Region 2  Region 3
0         0         0         0

pandas dataframe - does filtering / selecting cols by String preserve order?

3 Answers3