2

In the following snippet of code, I am trying to rename some columns (containing Hosted Meetings) to Hosted Meetings [date]. This is what happens when I print all the i's. However, it does not save this to df.

all_users_sheets_hosts = []

for f in glob.glob("./users-export-*.xlsx"):
    df = pd.read_excel(f)
    all_users_sheets_hosts.append(df)
    j = re.search('(\d+)', f)
    for i in df.columns.values:
        if 'Hosted Meetings' in i:
            i = ('Hosted Meetings' + ' ' + j.group(1))
Daniel Holmes
  • 1,952
  • 2
  • 17
  • 28
Frederic Bastiat
  • 695
  • 4
  • 12
  • 31

2 Answers2

4

The iterator i is a copy of the array value, not a pointer. One way to fix this would be to enumerate through the index rather than the values themselves:

for i, val in enumerate(df.columns.values):
    if 'Hosted Meetings' in val:
        df.columns[i] = ('Hosted Meetings' + ' ' + j.group(1))

However in any case it's a good illustration of the advantages of a more functional style. In your case, you can treat the columns as a pandas Series of type str, and hence you can use the vectorised replace on it, to rename your columns in one statement:

df.columns = df.columns.str.replace('.*Hosted Meeings.*', 
                                    'Hosted Meetings' + ' ' + j.group(1))
maxymoo
  • 35,286
  • 11
  • 92
  • 119
  • Wonderful. New to python so this helps me learn and with my problem – Frederic Bastiat Apr 27 '16 at 05:12
  • no worries, i've added a bit more to show you how to fix your loop, however it's almost always better (faster/safer/less buggy) to use a vectorised operation if you can work out how. – maxymoo Apr 27 '16 at 05:16
  • Also, how would I go about looking for all rows in multiple columns that fit some criteria? – Frederic Bastiat Apr 27 '16 at 17:15
  • 1
    there's a few different ways, http://stackoverflow.com/questions/11869910/pandas-filter-rows-of-dataframe-with-operator-chaining looks like it covers some of them (`query` is my favourite wherever possible) – maxymoo Apr 27 '16 at 23:22
  • Thank you @maxymoo , though I'm a bit confused by this, having never used queries or masks before. Say I want to delete rows where any cell, which is in a column whose title contains 'X', contains the word 'acme'. My current thinking is to add a for loop (for column titles containing X), then use a previously defined mask to remove 'acme' rows. Is there a better way? – Frederic Bastiat Apr 28 '16 at 04:41
  • yeah that's kind of a tricky query, I can't think of a better way off the top of my head; ask it as another question if you like and someone else might be able to help you – maxymoo Apr 28 '16 at 04:50
2

Update on the previous answer:

The current answer will now raise an exception: index does not support mutable operations

Change:

df.columns[i] = ('Hosted Meetings' + ' ' + j.group(1))

to:

df.columns.values[i] = ('Hosted Meetings' + ' ' + j.group(1))
Suraj Rao
  • 29,388
  • 11
  • 94
  • 103
Niall Parker
  • 177
  • 7