0

I have multiple dataframes and I want to filter each of them so that each df only keeps columns consisting of the word "Overall." I have the following for-loop but it doesn't have the same effect as if I do it manually [aka y15 = y15.filter(like='Overall')].

pit_dfs = [y15,y16,y17]

for i in pit_dfs:
    i = i.filter(like='Overall')

Replicable example:

y15 = pd.DataFrame({'Col1-Overall': ['a','b','c','d'],
              'Col2': ['a','b','c','d'],
              'Col3': ['a','b','c','d'],
              'Col4': ['a','b','c','d']})

y16 = pd.DataFrame({'Col1-Overall': ['a','b','c','d'],
              'Col2': ['a','b','c','d'],
              'Col3': ['a','b','c','d'],
              'Col4': ['a','b','c','d']})

y17 = pd.DataFrame({'Col1-Overall': ['a','b','c','d'],
              'Col2': ['a','b','c','d'],
              'Col3': ['a','b','c','d'],
              'Col4': ['a','b','c','d']})

Expected output:

y15
+--------------+
| Col1-Overall |
+--------------+
| a            |
+--------------+
| b            |
+--------------+
| c            |
+--------------+
| d            |
+--------------+

y16
+--------------+
| Col1-Overall |
+--------------+
| a            |
+--------------+
| b            |
+--------------+
| c            |
+--------------+
| d            |
+--------------+

y17
+--------------+
| Col1-Overall |
+--------------+
| a            |
+--------------+
| b            |
+--------------+
| c            |
+--------------+
| d            |
+--------------+

I know this is a simple one, but have been looking through Stack for the past hour and can't find a similar example. What am I missing? Thanks!

2 Answers2

2

See this answer and this example about Python for loops. The variable in the loop is not a pointer, so you're not changing the actual dataframes.

You can do (I haven't tested this):

pit_dfs = [y15,y16,y17,y18,y19]

for idx in range(len(pit_dfs)):
    pit_dfs[idx] = pit_dfs[idx].filter(like='Overall')
Joseph Hansen
  • 12,665
  • 8
  • 50
  • 68
  • Thanks so much for this! While this code does not change the original dataframes, it does change the dataframes within the list 'pit_dfs," which can then be accessed using pit_dfs[0], pit_dfs[1], etc. I appreciate it!! – Colin Sorensen Sep 22 '20 at 22:37
1

Here's an alternative:

pit_dfs = [y15,y16,y17,y18,y19]

def filter_cols_like(df, like):
    cols_not_like = [col for col in df.columns if like not in col]
    df.drop(columns=cols_not_like,inplace=True)

for i in pit_dfs:
    filter_cols_like(i,like='Overall')