Iterate through columns and rename according to a rule

Question

In the following snippet of code, I am trying to rename some columns (containing Hosted Meetings) to Hosted Meetings [date]. This is what happens when I print all the i's. However, it does not save this to df.

all_users_sheets_hosts = []

for f in glob.glob("./users-export-*.xlsx"):
    df = pd.read_excel(f)
    all_users_sheets_hosts.append(df)
    j = re.search('(\d+)', f)
    for i in df.columns.values:
        if 'Hosted Meetings' in i:
            i = ('Hosted Meetings' + ' ' + j.group(1))

maxymoo · Accepted Answer · 2016-04-27T05:15:16.320

4

The iterator i is a copy of the array value, not a pointer. One way to fix this would be to enumerate through the index rather than the values themselves:

for i, val in enumerate(df.columns.values):
    if 'Hosted Meetings' in val:
        df.columns[i] = ('Hosted Meetings' + ' ' + j.group(1))

However in any case it's a good illustration of the advantages of a more functional style. In your case, you can treat the columns as a pandas Series of type str, and hence you can use the vectorised replace on it, to rename your columns in one statement:

df.columns = df.columns.str.replace('.*Hosted Meeings.*', 
                                    'Hosted Meetings' + ' ' + j.group(1))

edited Apr 27 '16 at 05:15

answered Apr 27 '16 at 05:10

maxymoo

35,286
11
92
119

Wonderful. New to python so this helps me learn and with my problem – Frederic Bastiat Apr 27 '16 at 05:12
no worries, i've added a bit more to show you how to fix your loop, however it's almost always better (faster/safer/less buggy) to use a vectorised operation if you can work out how. – maxymoo Apr 27 '16 at 05:16
Also, how would I go about looking for all rows in multiple columns that fit some criteria? – Frederic Bastiat Apr 27 '16 at 17:15
1

there's a few different ways, http://stackoverflow.com/questions/11869910/pandas-filter-rows-of-dataframe-with-operator-chaining looks like it covers some of them (`query` is my favourite wherever possible) – maxymoo Apr 27 '16 at 23:22
Thank you @maxymoo , though I'm a bit confused by this, having never used queries or masks before. Say I want to delete rows where any cell, which is in a column whose title contains 'X', contains the word 'acme'. My current thinking is to add a for loop (for column titles containing X), then use a previously defined mask to remove 'acme' rows. Is there a better way? – Frederic Bastiat Apr 28 '16 at 04:41
yeah that's kind of a tricky query, I can't think of a better way off the top of my head; ask it as another question if you like and someone else might be able to help you – maxymoo Apr 28 '16 at 04:50

score 2 · Answer 2 · edited Sep 21 '20 at 06:22

2

Update on the previous answer:

The current answer will now raise an exception: index does not support mutable operations

Change:

df.columns[i] = ('Hosted Meetings' + ' ' + j.group(1))

to:

df.columns.values[i] = ('Hosted Meetings' + ' ' + j.group(1))

edited Sep 21 '20 at 06:22

Suraj Rao

29,388
11
94
103

answered Sep 21 '20 at 06:07

Niall Parker

177
7

Iterate through columns and rename according to a rule

2 Answers2