Pandas: Renaming "Unnamed: *" or "NaN" in data frame

Question

Here's my code so far:

import numpy as np
import pandas as pd
df = pd.read_excel(r'file.xlsx', index_col=0)

Here's what it looks like:

I want to rename the "Unnamed: *" columns to the last valid name.

Here's what I've tried and the results:

df.columns = df.columns.str.replace('Unnamed.*', method='ffill')
---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-253-c868b8bff7c7> in <module>()
----> 1 df.columns = df.columns.str.replace('Unnamed.*', method='ffill')

TypeError: replace() got an unexpected keyword argument 'method'

This "works" if I just do

df.columns = df.columns.str.replace('Unnamed.*', '')

But I then have either blank values or NaN (if I replace '' with 'NaN'. And then I try:

df.columns = df.columns.fillna('ffill')

Which has no effect. So I tried with inplace=True:

df.columns = df.columns.fillna('ffill', inplace=True)

---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-279-cce486472d5b> in <module>()
----> 1 df.columns = df.columns.fillna('ffill', inplace=True)

TypeError: fillna() got an unexpected keyword argument 'inplace'

Then I tried a different way:

i = 0
while i < len(df.columns):
    if df.columns[i] == 'NaN':
        df.columns[i] = df.columns[i-1]
    print(df.columns[i])
    i += 1

Which gives me this Error:

Oil
158 RGN Mistura
Access West Winter Blend 

---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-246-bc8fa6881b1a> in <module>()
      2 while i < len(df.columns):
      3     if df.columns[i] == 'NaN':
----> 4         df.columns[i] = df.columns[i-1]
      5     print(df.columns[i])
      6     i += 1

~\AppData\Local\Continuum\anaconda3\lib\site-packages\pandas\core\indexes\base.py in __setitem__(self, key, value)
   2048 
   2049     def __setitem__(self, key, value):
-> 2050         raise TypeError("Index does not support mutable operations")
   2051 
   2052     def __getitem__(self, key):

TypeError: Index does not support mutable operations

score 1 · Accepted Answer · answered Jun 20 '18 at 20:03

I did the following, i think it maintains the order you seek.

df = pd.read_excel('book1.xlsx')
print df


    a   b   c  Unnamed: 3  Unnamed: 4   d  Unnamed: 6   e  Unnamed: 8   f
0  34  13  73         nan         nan  87         nan  76         nan  36
1  70  48   1         nan         nan  88         nan   2         nan  77
2  37  62  28         nan         nan   2         nan  53         nan  60
3  17  97  78         nan         nan  69         nan  93         nan  48
4  65  19  96         nan         nan  72         nan   4         nan  57
5  63   6  86         nan         nan  14         nan  20         nan  51
6  10  67  54         nan         nan  52         nan  48         nan  79


df.columns = pd.Series([np.nan if 'Unnamed:' in x else x for x in df.columns.values]).ffill().values.flatten()
print df


    a   b   c   c   c   d   d   e   e   f
0  34  13  73 nan nan  87 nan  76 nan  36
1  70  48   1 nan nan  88 nan   2 nan  77
2  37  62  28 nan nan   2 nan  53 nan  60
3  17  97  78 nan nan  69 nan  93 nan  48
4  65  19  96 nan nan  72 nan   4 nan  57
5  63   6  86 nan nan  14 nan  20 nan  51
6  10  67  54 nan nan  52 nan  48 nan  79

Anton vBR · Answer 2 · 2018-06-20T20:20:11.300

Something that could work:

df.columns = df.columns.where(~df.columns.str.startswith('Unnamed')).to_series().ffill()

, full example:

import numpy as np
import pandas as pd

df = pd.DataFrame(columns=['First', 'Unnamed: 1', 'Unnamed: 2','Second', 'Unnamed: 3'])

df.columns = df.columns.where(~df.columns.str.startswith('Unnamed')).to_series().ffill()

print(df.columns)

Prints:

Index(['First', 'First', 'First', 'Second', 'Second'], dtype='object')

tobsecret · Answer 3 · 2018-06-20T20:06:49.980

The problem you are running into has to do with the fact that columns and the index are pd.Index objects. The fillna method of a pandas Index does not take the same arguments that the fillna methods for pandas Series or DataFrame take. I made a toy example below:

import pandas as pd
import numpy as np
df = pd.DataFrame(
         {'a':[1], 'Unnamed:1':[1], 'Unnamed:2':[1], 'b':[1], 'Unnamed:3':[1]}, 
         columns=['a', 'Unnamed:3', 'Unnamed:1', 'b', 'Unnamed:2']))
df 
#   a  Unnamed:3  Unnamed:1  b  Unnamed:2
#0  1          1          1  1          1

Your original regex does not capture the entire column name, let's fix that.

df.columns.str.replace('Unnamed:*', '') 
#Index(['a', '3', '1', 'b', '2'], dtype='object')
df.columns.str.replace('Unnamed:\d+', '')
#Index(['a', '', '', 'b', ''], dtype='object')
df.columns.str.replace('Unnamed:.+', '')
#Index(['a', '', '', 'b', ''], dtype='object')

Now let's convert the index into a series so we can use the .replace and .fillna methods of the pd.Series and one of the working regexes to replace the column names in question with ffill. Finally we convert into a pd.Index

pd.Index(
    pd.Series(
        df.columns
    ).replace('Unnamed:\d+', np.nan, regex=True).fillna(method='ffill')
)
#Index(['a', 'a', 'a', 'b', 'b'], dtype='object')

df.columns = pd.Index(pd.Series(df.columns).replace('Unnamed:\d+', np.nan, regex=True).fillna(method='ffill'))
df.head() 
#   a  a  a  b  b
#0  1  1  1  1  1

For some reason this didn't work. I don't get any errors, it just doesn't make any changes to the data frame. Am I supposed to put a "inplace=True" somewhere? (When I try doing that I get an error) — Mitch, Jun 21 '18 at 18:31
Hmm... does my example code not work for you? You just need the following three lines: `df = pd.DataFrame( {'a':[1], 'Unnamed:1':[1], 'Unnamed:2':[1], 'b':[1], 'Unnamed:3':[1]}, columns=['a', 'Unnamed:3', 'Unnamed:1', 'b', 'Unnamed:2']))` `df.columns = pd.Index(pd.Series(df.columns).replace('Unnamed:\d+', np.nan, regex=True).fillna(method='ffill'))` `df.head()` — tobsecret, Jun 21 '18 at 19:01

Pandas: Renaming "Unnamed: *" or "NaN" in data frame

3 Answers3