0

I have a use case where I have say 10 cols out of which 5 start with the string 'Region'. I need to get a resulting dataframe which only contains those cols (starting with string 'Region'). Not only that, I need to make sure the order is preserved (e.g. if in original df, the col order is 'Region 1', 'Region 2', 'Region 3' -- this should be preserved and not result in 'Region 3', 'Region 2', 'Region 1' instead).

Would following the 'accepted answer' for this question preserve the order or is there some other method to achieve that?

stackoverflow - find-column-whose-name-contains-a-specific-string

BENY
  • 317,841
  • 20
  • 164
  • 234
Ali Khan
  • 59
  • 1
  • 9

3 Answers3

2

Yes, it will. df.columns is a list, when you iterate over list, you preserve the order of the list. Thus, you can use the answer from the mentioned link:

region_cols = [col for col in df.columns if 'Region' in col]

df[region_cols] - will be the df you require.

Artyom Akselrod
  • 946
  • 6
  • 14
2

Two steps first use filter

s=df.filter(like='Region')
BENY
  • 317,841
  • 20
  • 164
  • 234
1

if your data frame is similar to :

print(df)


   Region 3  Region 2  Region 1  Custom  UnwantedCol
0         0         0         0       0            0

we can use the sorted method to sort your columns by the number:

nat_cols_sort = dict(sorted(
    {col: int(col.split(" ")[1]) for col in df.filter(regex='^Region').columns}.items(),
    key=lambda x: x[1],
))


print(df[nat_sort.keys()])

   Region 1  Region 2  Region 3
0         0         0         0
Umar.H
  • 22,559
  • 7
  • 39
  • 74