Convert row to column header for Pandas DataFrame,

Question

The data I have to work with is a bit messy.. It has header names inside of its data. How can I choose a row from an existing pandas dataframe and make it (rename it to) a column header?

I want to do something like:

header = df[df['old_header_name1'] == 'new_header_name1']

df.columns = header

unutbu · Accepted Answer · 2019-08-14T15:52:10.153

299

In [21]: df = pd.DataFrame([(1,2,3), ('foo','bar','baz'), (4,5,6)])

In [22]: df
Out[22]: 
     0    1    2
0    1    2    3
1  foo  bar  baz
2    4    5    6

Set the column labels to equal the values in the 2nd row (index location 1):

In [23]: df.columns = df.iloc[1]

If the index has unique labels, you can drop the 2nd row using:

In [24]: df.drop(df.index[1])
Out[24]: 
1 foo bar baz
0   1   2   3
2   4   5   6

If the index is not unique, you could use:

In [133]: df.iloc[pd.RangeIndex(len(df)).drop(1)]
Out[133]: 
1 foo bar baz
0   1   2   3
2   4   5   6

Using df.drop(df.index[1]) removes all rows with the same label as the second row. Because non-unique indexes can lead to stumbling blocks (or potential bugs) like this, it's often better to take care that the index is unique (even though Pandas does not require it).

edited Aug 14 '19 at 15:52

answered Oct 01 '14 at 17:42

unutbu

842,883
184
1,785
1,677

Thank you so much for your quick response! How can I choose a row by value in stead of index location to make it header? So for your example something like.. df.columns = df[df[0] == 'foo'] – E.K. Oct 01 '14 at 17:57
The problem with that is there could be more than one row which has the value `"foo"`. One way around that problem is to explicitly choose the first such row: `df.columns = df.iloc[np.where(df[0] == 'foo')[0][0]]`. – unutbu Oct 01 '14 at 18:02
Ah I see why you did that way. For my case, I know there is only one row that has the value "foo". So it is ok. I just did this way I guess it is the same as the one you gave me above. idx_loc = df[df[0] == 'foo'].index.tolist()[0] df.columns = df.iloc[idx_loc] – E.K. Oct 01 '14 at 18:08

score 112 · Answer 2 · edited Mar 15 '17 at 22:56

112

This works (pandas v'0.19.2'):

df.rename(columns=df.iloc[0])

edited Mar 15 '17 at 22:56

Rob

26,989
16
82
98

answered Mar 15 '17 at 22:22

Zachary Wilson

1,244
1
8
4

40

You can remove the "header" row by adding `.drop(df.index[0])` – ostrokach Nov 23 '17 at 22:01
2

I like this better than the actual accepted answer. I love the short oneline solutions. – Javier Jan 23 '19 at 16:09
1

Please keep in mind that after dropping the first row, index would start from 1, so you probably would like to add `.reset_index(drop=True)`. – onestep.ua Nov 16 '22 at 21:39

score 35 · Answer 3 · answered Dec 14 '18 at 08:04

35

It would be easier to recreate the data frame. This would also interpret the columns types from scratch.

headers = df.iloc[0]
new_df  = pd.DataFrame(df.values[1:], columns=headers)

answered Dec 14 '18 at 08:04

shahar_m

3,461
5
41
61

2

Simple and easy. Nice! – Draco D Feb 05 '21 at 06:10

score 16 · Answer 4 · answered Sep 04 '20 at 17:52

16

To rename the header without reassign df:

df.rename(columns=df.iloc[0], inplace = True)

To drop the row without reassign df:

df.drop(df.index[0], inplace = True)

answered Sep 04 '20 at 17:52

Govinda

789
7
6

ccpizza · Answer 5 · 2020-06-08T17:21:12.893

5

You can specify the row index in the read_csv or read_html constructors via the header parameter which represents Row number(s) to use as the column names, and the start of the data. This has the advantage of automatically dropping all the preceding rows which supposedly are junk.

import pandas as pd
from io import StringIO

In[1]
    csv = '''junk1, junk2, junk3, junk4, junk5
    junk1, junk2, junk3, junk4, junk5
    pears, apples, lemons, plums, other
    40, 50, 61, 72, 85
    '''

    df = pd.read_csv(StringIO(csv), header=2)
    print(df)

Out[1]
       pears   apples   lemons   plums   other
    0     40       50       61      72      85

edited Jun 08 '20 at 17:21

answered Aug 13 '18 at 12:45

ccpizza

28,968
18
162
169

This does not address the question itself, which is asking about an already existing `DataFrame`. – pablete Jul 19 '22 at 23:30
some of the users who found this question (possibly the majority) would have a more generic use case than the OP; this answer is for that group – ccpizza Jul 21 '22 at 17:17

score 0 · Answer 6 · answered Aug 26 '22 at 10:38

0

Keeping it Python simple

Padas DataFrames have columns attribute why not use it with standard Python, it is much clearer what you are doing:

table = [['name', 'Rf', 'Rg', 'Rf,skin', 'CRI'],
 ['testsala.cxf', '86', '95', '92', '87'],
 ['testsala.cxf: 727037 lm', '86', '95', '92', '87'],
 ['630.cxf', '18', '8', '11', '18'],
 ['Huawei stk-lx1.cxf', '86', '96', '88', '83'],
 ['dedo uv no filtro.cxf', '52', '93', '48', '58']]

import pandas as pd
data = pd.DataFrame(table[1:],columns=table[0])

or in the case is not the first row, but the 10th for instance:

columns = table.pop(10)
data = pd.DataFrame(table,columns=columns)

answered Aug 26 '22 at 10:38

G M

20,759
10
81
84

1

Tested for performance, although we know that the creation of a new DataFrame is "time-consuming" Anyhow this approach took 40X more time – gbox Jan 04 '23 at 21:11
@gbox thanks for you comment! If you want edit the answer – G M Jan 05 '23 at 09:51

Convert row to column header for Pandas DataFrame,

6 Answers6

Keeping it Python simple

Linked

Related