Is there a way to horizontally concatenate dataframes of same length while ignoring the index?

Question

I have dataframes I want to horizontally concatenate while ignoring the index.

I know that for arithmetic operations, ignoring the index can lead to a substantial speedup if you use the numpy array .values instead of the pandas Series. Is it possible to horizontally concatenate or merge pandas dataframes whilst ignoring the index? (To my dismay, ignore_index=True does something else.) And if so, does it give a speed gain?

import pandas as pd

df1 = pd.Series(range(10)).to_frame()

df2 = pd.Series(range(10), index=range(10, 20)).to_frame()

pd.concat([df1, df2], axis=1)
#      0    0
# 0   0.0  NaN
# 1   1.0  NaN
# 2   2.0  NaN
# 3   3.0  NaN
# 4   4.0  NaN
# 5   5.0  NaN
# 6   6.0  NaN
# 7   7.0  NaN
# 8   8.0  NaN
# 9   9.0  NaN
# 10  NaN  0.0
# 11  NaN  1.0
# 12  NaN  2.0
# 13  NaN  3.0
# 14  NaN  4.0
# 15  NaN  5.0
# 16  NaN  6.0
# 17  NaN  7.0
# 18  NaN  8.0
# 19  NaN  9.0

I know I can get the result I want by resetting the index of df2, but I wonder whether there is a faster (perhaps numpy method) to do this?

You can do `np.hstack([df1,df2])` which would be faster but this produces a pure np array, but you can easily make a df from this and it should be fast as no reallocation occurs as the np array is compatible with df — EdChum, May 09 '18 at 09:47
https://stackoverflow.com/questions/32801806/pandas-concat-ignore-index-doesnt-work — The Unfun Cat, May 09 '18 at 10:16
Yeah it's a little confusing, it's almost like it should be called `ignore_axis_index` or `ignore_axis` or similar — EdChum, May 09 '18 at 10:18

score 7 · Answer 1 · answered May 10 '18 at 01:50

7

`np.column_stack`

Absolutely equivalent to EdChum's answer.

pd.DataFrame(
    np.column_stack([df1,df2]),
    columns=df1.columns.append(df2.columns)
)

   0  0
0  0  0
1  1  1
2  2  2
3  3  3
4  4  4
5  5  5
6  6  6
7  7  7
8  8  8
9  9  9

Pandas Option with `assign`

You can do many things with the new columns.
I don't recommend this!

df1.assign(**df2.add_suffix('_').to_dict('l'))

   0  0_
0  0   0
1  1   1
2  2   2
3  3   3
4  4   4
5  5   5
6  6   6
7  7   7
8  8   8
9  9   9

answered May 10 '18 at 01:50

piRSquared

285,575
57
475
624

column_stack is actually better. For hstack, it seems the data needs to have the same dimensions in both m _and_ n! I get `ValueError: all the input arrays must have same number of dimensions` – The Unfun Cat May 10 '18 at 12:56

EdChum · Accepted Answer · 2018-05-09T10:03:32.487

4

A pure numpy method would be to use np.hstack:

In[33]:
np.hstack([df1,df2])

Out[33]: 
array([[0, 0],
       [1, 1],
       [2, 2],
       [3, 3],
       [4, 4],
       [5, 5],
       [6, 6],
       [7, 7],
       [8, 8],
       [9, 9]], dtype=int64)

this can be easily converted to a df by passing this as the data arg to the DataFrame ctor:

In[34]:
pd.DataFrame(np.hstack([df1,df2]))

Out[34]: 
   0  1
0  0  0
1  1  1
2  2  2
3  3  3
4  4  4
5  5  5
6  6  6
7  7  7
8  8  8
9  9  9

with respect to whether the data is contiguous, the individual columns will be treated as separate arrays as it's a dict of Series essentially, as you're passing numpy arrays there is no allocation of memory and copying needed here for simple and homogeneous dtype so it should be fast.

edited May 09 '18 at 10:03

answered May 09 '18 at 09:52

EdChum

376,765
198
813
562

According to my rudimentary timings pd.concat is faster if the indexes are exactly equal (in order also). But otherwise `np.hstack` is so much faster it is silly - like only 1% of the time for 1e8 values (848ms vs 1 minute 3 sec). – The Unfun Cat May 09 '18 at 09:55
1

Yeah with `pandas` you're paying for the flexibility, ease of use, and various dtype/index checking. If you have homogeneous dypes and numerical data then using pure numpy or numpy as an intermediary and then constructing pandas dfs as and when necessary will be much faster – EdChum May 09 '18 at 10:00

Is there a way to horizontally concatenate dataframes of same length while ignoring the index?

2 Answers2

`np.column_stack`

Pandas Option with `assign`

Linked

Is there a way to horizontally concatenate dataframes of same length while ignoring the index?

2 Answers2

np.column_stack

Pandas Option with assign

Linked

`np.column_stack`

Pandas Option with `assign`