pandas concat columns ignore_index doesn't work

Question

I am trying to column-bind dataframes and having issue with pandas concat, as ignore_index=True doesn't seem to work:

df1 = pd.DataFrame({'A': ['A0', 'A1', 'A2', 'A3'],
                    'B': ['B0', 'B1', 'B2', 'B3'],
                    'D': ['D0', 'D1', 'D2', 'D3']},
                   index=[0, 2, 3, 4])

df2 = pd.DataFrame({'A1': ['A4', 'A5', 'A6', 'A7'],
                    'C': ['C4', 'C5', 'C6', 'C7'],
                    'D2': ['D4', 'D5', 'D6', 'D7']},
                   index=[5, 6, 7, 3])
df1
#     A   B   D
# 0  A0  B0  D0
# 2  A1  B1  D1
# 3  A2  B2  D2
# 4  A3  B3  D3

df2
#    A1   C  D2
# 5  A4  C4  D4
# 6  A5  C5  D5
# 7  A6  C6  D6
# 3  A7  C7  D7

dfs = [df1, df2]
df = pd.concat(dfs, axis=1, ignore_index=True)     
print df

and the result is

     0    1    2    3    4    5    
0   A0   B0   D0  NaN  NaN  NaN  
2   A1   B1   D1  NaN  NaN  NaN    
3   A2   B2   D2   A7   C7   D7   
4   A3   B3   D3  NaN  NaN  NaN  
5  NaN  NaN  NaN   A4   C4   D4  
6  NaN  NaN  NaN   A5   C5   D5  
7  NaN  NaN  NaN   A6   C6   D6

Even if I reset index using

df1.reset_index()    
df2.reset_index()

and then try

pd.concat([df1, df2], axis=1)

it still produces the same result!

Does `pd.concat([df1, df2], axis=0, ignore_index=True)` produce what you want? If not, can you specify your expected output? — Alex Riley, Sep 26 '15 at 20:43
no, it binds the rows . I want to bind the columns (append). I tried append, that doesn't seem to work either. — muon, Sep 26 '15 at 20:53
@ajcr, have you compared the output of `pd.concat([df1, df2], axis=1, ignore_index=True)` and `pd.concat([df1, df2], axis=1)`? Shouldn't the first intuitively emulate a `cbind`? — cel, Sep 26 '15 at 20:57
I think `ignore_index` only ignores the labels on the axis you're joining on, so it still does an outer join on the index labels. I agree the names of function arguments aren't the most intuitive here. — Alex Riley, Sep 26 '15 at 21:24
yes, i realized that from @Alex answer ... but i have the same results even with ignore_index=False — muon, Sep 26 '15 at 21:36

score 140 · Accepted Answer · edited Aug 04 '23 at 23:32

If I understood you correctly, this is what you would like to do.

import pandas as pd

df1 = pd.DataFrame({'A': ['A0', 'A1', 'A2', 'A3'],
                    'B': ['B0', 'B1', 'B2', 'B3'],
                    'D': ['D0', 'D1', 'D2', 'D3']},
                   index=[0, 2, 3, 4])

df2 = pd.DataFrame({'A1': ['A4', 'A5', 'A6', 'A7'],
                    'C': ['C4', 'C5', 'C6', 'C7'],
                    'D2': ['D4', 'D5', 'D6', 'D7']},
                   index=[4, 5, 6 , 7])


df1.reset_index(drop=True, inplace=True)
df2.reset_index(drop=True, inplace=True)

df = pd.concat([df1, df2], axis=1)

Which gives:

    A   B   D   A1  C   D2
0   A0  B0  D0  A4  C4  D4
1   A1  B1  D1  A5  C5  D5
2   A2  B2  D2  A6  C6  D6
3   A3  B3  D3  A7  C7  D7

Actually, I would have expected that df = pd.concat(dfs, axis=1, ignore_index=True) gives the same result.

This is the excellent explanation from jreback:

ignore_index=True ‘ignores’, meaning doesn’t align on the joining axis. it simply pastes them together in the order that they are passed, then reassigns a range for the actual index (e.g. range(len(index))) so the difference between joining on non-overlapping indexes (assume axis=1 in the example), is that with ignore_index=False (the default), you get the concat of the indexes, and with ignore_index=True you get a range.

Oh that works ... Thanks! Funny thing is I was using same method to bind dataframes inside a function and that was working fine! but one outside function wasn't — muon, Sep 26 '15 at 21:07
@mau, I have updated my answer and now use `pd.reset_index()`. I think this is a cleaner way. — cel, Sep 26 '15 at 21:35
I happened to try that out myself, could have saved myself few hours if i had seen this earlier :). Thanks... `df = pd.concat( [df1.reset_index(drop=True), df2.reset_index(drop=True)], axis=1)` — muon, Sep 27 '15 at 02:51

Alex · Answer 2 · 2015-09-26T21:22:10.587

34

The ignore_index option is working in your example, you just need to know that it is ignoring the axis of concatenation which in your case is the columns. (Perhaps a better name would be ignore_labels.) If you want the concatenation to ignore the index labels, then your axis variable has to be set to 0 (the default).

edited Sep 26 '15 at 21:22

answered Sep 26 '15 at 20:53

Alex

2,154
3
26
49

Thanks! that was helpful (can't upvote yet, low rep) – muon Sep 26 '15 at 21:30
Indeed, this is a useful explanation that is missing in the docs. – Hugo Santos Silva Aug 19 '20 at 16:00

score 25 · Answer 3 · answered Oct 21 '21 at 14:03

25

In case you want to retain the index of the left data frame, set the index of df2 to be df1 using set_index:

pd.concat([df1, df2.set_index(df1.index)], axis=1)

answered Oct 21 '21 at 14:03

eljusticiero67

2,257
4
15
18

score 8 · Answer 4 · edited Aug 04 '23 at 23:34

Agree with the comments, always best to post expected output.

Is this what you are seeking?

df1 = pd.DataFrame({'A': ['A0', 'A1', 'A2', 'A3'],
                    'B': ['B0', 'B1', 'B2', 'B3'],
                    'D': ['D0', 'D1', 'D2', 'D3']},
                   index=[0, 2, 3, 4])

df2 = pd.DataFrame({'A1': ['A4', 'A5', 'A6', 'A7'],
                    'C': ['C4', 'C5', 'C6', 'C7'],
                    'D2': ['D4', 'D5', 'D6', 'D7']},
                   index=[5, 6, 7, 3])

df1 = df1.transpose().reset_index(drop=True).transpose()
df2 = df2.transpose().reset_index(drop=True).transpose()

dfs = [df1, df2]
df = pd.concat(dfs, axis=0, ignore_index=True)

print df

    0   1   2
0  A0  B0  D0
1  A1  B1  D1
2  A2  B2  D2
3  A3  B3  D3
4  A4  C4  D4
5  A5  C5  D5
6  A6  C6  D6
7  A7  C7  D7

score 5 · Answer 5 · answered Jun 23 '21 at 18:23

You can use numpy's concatenate to achieve the result.

cols = df1.columns.to_list() + df2.columns.to_list()
dfs = [df1,df2]
df = np.concatenate(dfs, axis=1)  
df = pd.DataFrame(df, columns=cols)

Out[1]: 
    A   B   D  A1   C  D2
0  A0  B0  D0  A4  C4  D4
1  A1  B1  D1  A5  C5  D5
2  A2  B2  D2  A6  C6  D6
3  A3  B3  D3  A7  C7  D7

score 1 · Answer 6 · edited Aug 04 '23 at 23:36

1

For some reason ignore_index=True doesn't help in my case. I wanted to keep index from the first dataset and ignore the second index. This worked for me:

X_train = pd.concat([train_sp, X_train.reset_index(drop=True, inplace=True)], axis=1)

edited Aug 04 '23 at 23:36

wjandrea

28,235
9
60
81

answered Dec 14 '17 at 09:05

Yury Wallet

1,474
1
13
24

`inplace=True`? That shouldn't be there, it'll make it return `None`. – wjandrea Aug 04 '23 at 23:37

pandas concat columns ignore_index doesn't work

6 Answers6

Linked

Related