Combine two Pandas rows into one with duplicated columns for time series

Question

I have the following problem that I am trying to solve. I have two Pandas Dataframe rows with the same columns:

Column A	Column B
Cell 1	Cell 2
Cell 3	Cell 4

I want to combine both rows into one single row by appending the columns:

Column A_1	Column B_1	Column A_2	Column B_2
Cell 1	Cell 2	Cell 3	Cell 4

This operation is used to create a time series row with window size 2 for training a machine learning model. Therefore, I am doing this operation millions of times which should require a small operational cost.

Thanks in advance!

I tried using pandas concat but is is just too slow and requires a lot of ram

We need you to post a reproducible example (use random-seeded generated data), instead of saying "`pd.concat` is too slow and requires a lot of RAM". Show us numbers. Do you really start from a huge number of 2x2 dataframes? If so, optimize that: show us the code that generated them. Also, what are the dtypes of the columns? — smci, Jun 18 '23 at 19:57

Corralien · Answer 1 · 2023-06-18T19:56:38.150

You can use stack():

out = df.stack().droplevel(0).to_frame().T
out.columns += '_' + out.groupby(level=0, axis=1).cumcount().add(1).astype(str)
print(out)

# Output
  Column A_1 Column B_1 Column A_2 Column B_2
0     Cell 1     Cell 2     Cell 3     Cell 4

If you have multiple rows, you can use numpy.reshape:

>>> pd.DataFrame(df.values.reshape(-1, 4)).add_prefix('Col_')
    Col_0   Col_1   Col_2   Col_3
0  Cell 1  Cell 2  Cell 3  Cell 4
1  Cell 1  Cell 2  Cell 3  Cell 4

score 2 · Answer 2 · answered Jun 18 '23 at 19:47

I hope I've understood you correctly, but you can try:

x = df.stack().reset_index()
x[''] = x['level_1'] + '_' + (x['level_0'] + 1).astype(str)
x = x[['', 0]].set_index('').T

print(x)

Prints:

  Column A_1 Column B_1 Column A_2 Column B_2
0     Cell 1     Cell 2     Cell 3     Cell 4

score 1 · Answer 3 · answered Jun 18 '23 at 19:53

1

Maybe it helps:

result = df.stack()
result.index = [f"{y}_{x+1}" for x,y in result.index]
result = pd.DataFrame(result).T

answered Jun 18 '23 at 19:53

MaryRa

463
1
4

PaulS · Answer 4 · 2023-06-18T23:49:45.643

Another possible solution:

(pd.DataFrame(np.hstack(df.values.T)).T
 .set_axis([f'{x}_{y+1}' for y in range(2) for x in df.columns], axis=1))

Alternatively,

from itertools import chain

(pd.DataFrame(chain(*[df[col] for col in df.columns])).T
 .set_axis([f'{x}_{y}' for y in range(1,3) for x in df.columns], axis=1))

Output:

  Column A_1 Column B_1 Column A_2 Column B_2
0     Cell 1     Cell 3     Cell 2     Cell 4

Combine two Pandas rows into one with duplicated columns for time series

4 Answers4