1

I have the following problem that I am trying to solve. I have two Pandas Dataframe rows with the same columns:

Column A Column B
Cell 1 Cell 2
Cell 3 Cell 4

I want to combine both rows into one single row by appending the columns:

Column A_1 Column B_1 Column A_2 Column B_2
Cell 1 Cell 2 Cell 3 Cell 4

This operation is used to create a time series row with window size 2 for training a machine learning model. Therefore, I am doing this operation millions of times which should require a small operational cost.

Thanks in advance!

I tried using pandas concat but is is just too slow and requires a lot of ram

bktllr
  • 33
  • 4
  • We need you to post a reproducible example (use random-seeded generated data), instead of saying "`pd.concat` is too slow and requires a lot of RAM". Show us numbers. Do you really start from a huge number of 2x2 dataframes? If so, optimize that: show us the code that generated them. Also, what are the dtypes of the columns? – smci Jun 18 '23 at 19:57

4 Answers4

3

You can use stack():

out = df.stack().droplevel(0).to_frame().T
out.columns += '_' + out.groupby(level=0, axis=1).cumcount().add(1).astype(str)
print(out)

# Output
  Column A_1 Column B_1 Column A_2 Column B_2
0     Cell 1     Cell 2     Cell 3     Cell 4

If you have multiple rows, you can use numpy.reshape:

>>> pd.DataFrame(df.values.reshape(-1, 4)).add_prefix('Col_')
    Col_0   Col_1   Col_2   Col_3
0  Cell 1  Cell 2  Cell 3  Cell 4
1  Cell 1  Cell 2  Cell 3  Cell 4
Corralien
  • 109,409
  • 8
  • 28
  • 52
2

I hope I've understood you correctly, but you can try:

x = df.stack().reset_index()
x[''] = x['level_1'] + '_' + (x['level_0'] + 1).astype(str)
x = x[['', 0]].set_index('').T

print(x)

Prints:

  Column A_1 Column B_1 Column A_2 Column B_2
0     Cell 1     Cell 2     Cell 3     Cell 4
Andrej Kesely
  • 168,389
  • 15
  • 48
  • 91
1

Maybe it helps:

result = df.stack()
result.index = [f"{y}_{x+1}" for x,y in result.index]
result = pd.DataFrame(result).T

enter image description here

MaryRa
  • 463
  • 1
  • 4
0

Another possible solution:

(pd.DataFrame(np.hstack(df.values.T)).T
 .set_axis([f'{x}_{y+1}' for y in range(2) for x in df.columns], axis=1))

Alternatively,

from itertools import chain

(pd.DataFrame(chain(*[df[col] for col in df.columns])).T
 .set_axis([f'{x}_{y}' for y in range(1,3) for x in df.columns], axis=1))

Output:

  Column A_1 Column B_1 Column A_2 Column B_2
0     Cell 1     Cell 3     Cell 2     Cell 4
PaulS
  • 21,159
  • 2
  • 9
  • 26