Joining or merging multiple columns within one dataframe and keeping all data

Question

I have this dataframe:

df = pd.DataFrame({'Position1':[1,2,3], 'Count1':[55,35,45],\
                   'Position2':[4,2,7], 'Count2':[15,35,75],\
                   'Position3':[3,5,6], 'Count3':[45,95,105]})
print(df)

   Position1  Count1  Position2  Count2  Position3  Count3
0          1      55          4      15          3      45
1          2      35          2      35          5      95
2          3      45          7      75          6     105

I want to join the Position columns into one column named "Positions" while sorting the data in the Counts columns like so:

   Positions Count1 Count2 Count3
0          1     55    Nan    Nan
1          2     35     35    Nan
2          3     45    NaN     45
3          4    NaN     15    Nan
4          5    NaN    NaN     95
5          6    Nan    NaN    105
6          7    Nan     75    NaN

I've tried melting the dataframe, combining and merging columns but I am a bit stuck.

Note that the NaN types can easily be replaced by using df.fillna to get a dataframe like so:

df = df.fillna(0)

   Positions  Count1  Count2  Count3
0          1      55       0       0
1          2      35      35       0
2          3      45       0      45
3          4       0      15       0
4          5       0       0      95
5          6       0       0     105
6          7       0      75       0

score 1 · Answer 1 · answered May 17 '22 at 23:47

Does this achieve what you are after?

import pandas as pd
df = pd.DataFrame({'Position1':[1,2,3], 'Count1':[55,35,45],\
                   'Position2':[4,2,7], 'Count2':[15,35,75],\
                   'Position3':[3,5,6], 'Count3':[45,95,105]})

df1, df2, df3 = df.iloc[:,:2], df.iloc[:, 2:4], df.iloc[:, 4:6]

df1.columns, df2.columns, df3.columns = ['Positions', 'Count1'], ['Positions', 'Count2'], ['Positions', 'Count3']

df1.merge(df2, on='Positions', how='outer').merge(df3, on='Positions', how='outer').sort_values('Positions')

Output:

Naveed · Answer 2 · 2022-05-18T00:01:02.343

wide_to_long unpivots the DF from Long to wide and that is what's used here.

columns names are also renamed here, with this edit

df['id'] = df.index
df2=pd.wide_to_long(df, stubnames=['Position','Count'], i='id', j='pos').reset_index()
df2=df2.pivot(index=['id','Position'], columns='pos', values='Count').reset_index().fillna(0).add_prefix('count_')
df2.rename(columns={'count_id': 'id', 'count_Position' :'Position'}, inplace=True)
df2

RESULT:

pos     id  Position    1   2   3
0   0   1   55.0    0.0     0.0
1   0   3   0.0     0.0     45.0
2   0   4   0.0     15.0    0.0
3   1   2   35.0    35.0    0.0
4   1   5   0.0     0.0     95.0
5   2   3   45.0    0.0     0.0
6   2   6   0.0     0.0     105.0
7   2   7   0.0     75.0    0.0

PS: I'm unable to format the output, I'll appreciate if someone guide me here. Thanks!

This solution also helps if you happen to have position column beyond the 3 as depicted in the question — Naveed, May 18 '22 at 00:02

constantstranger · Accepted Answer · 2022-05-18T00:09:47.960

Here is a way to do what you've asked:

df = df[['Position1', 'Count1']].rename(columns={'Position1':'Positions'}).join(
    df[['Position2', 'Count2']].set_index('Position2'), on='Positions', how='outer').join(
    df[['Position3', 'Count3']].set_index('Position3'), on='Positions', how='outer').sort_values(
    by=['Positions']).reset_index(drop=True)

Output:

   Positions  Count1  Count2  Count3
0          1    55.0     NaN     NaN
1          2    35.0    35.0     NaN
2          3    45.0     NaN    45.0
3          4     NaN    15.0     NaN
4          5     NaN     NaN    95.0
5          6     NaN     NaN   105.0
6          7     NaN    75.0     NaN

Explanation:

Use join first on Position1, Count1 and Position2, Count2 (with Position1 renamed as Positions) then on that join result and Position3, Count3.
Sort by Positions and use reset_index to create a new integer range index (ascending with no gaps).

score 1 · Answer 4 · answered May 23 '22 at 22:54

One option is to flip to long form with pivot_longer before flipping back to wide form with pivot_wider from pyjanitor:

# pip install pyjanitor
import pandas as pd
import janitor

(df
.pivot_longer(
    index = None, 
    names_to = ('.value', 'num'), 
    names_pattern = r"(.+)(\d+)")
.pivot_wider(index = 'Position', names_from = 'num')
)
   Position  Count_1  Count_2  Count_3
0         1     55.0      NaN      NaN
1         2     35.0     35.0      NaN
2         3     45.0      NaN     45.0
3         4      NaN     15.0      NaN
4         5      NaN      NaN     95.0
5         6      NaN      NaN    105.0
6         7      NaN     75.0      NaN

In the pivot_longer section, the .value determines which part of the column names remain as column headers - in this case it is is Position and Count.

Joining or merging multiple columns within one dataframe and keeping all data

4 Answers4