Pandas split DataFrame when column name are in values

Question

I have one dataframe that is not well formatted, it look like

0  1
col_name1     val1
col_name2     val2
col_name3     val3
col_name1     val4
col_name2     val5
col_name3     val6
.  .             .
.  .             .

and I wanted to make it look like

col_name1,col_name2,col_name3
val1,val2,val3
val4,val5,val6

How can I split it that way?

I tried to transpose the dataframe and didn't work the same for some groupby manipulation.

score 1 · Answer 1 · answered May 16 '19 at 09:49

1

You can use:

m=df.groupby('0')['1'].apply(list)
df1=pd.DataFrame(m.values.tolist(),index=m.index).T.rename_axis(None,axis=1)
print(df1)

  col_name1 col_name2 col_name3
0      val1      val2      val3
1      val4      val5      val6

answered May 16 '19 at 09:49

anky

74,114
11
41
70

score 1 · Answer 2 · answered May 16 '19 at 09:55

1

new_df = {i:[] for i in list(set(df["0"]))}
for i in range(len(df)):
    new_df[df["0"][i]].append(df["1"][i])

Result
col_name2 col_name3 col_name1
0      val2      val3      val1
1      val5      val6      val4

answered May 16 '19 at 09:55

Tanmay Shrivastava

553
4
9

score 1 · Answer 3 · answered May 16 '19 at 10:19

If the order of rows is consistent, you can just pivot your dataframe after adding a new pseudo-index with int(index / 3):

df['ndx'] = (df.index / 3).astype(int)
df = df.pivot(index='ndx', columns='0', values='1')

If unsure, this will be more robust provided col_name1 comes always first:

df['ndx'] = pd.Series(np.where(df['0'] == 'col_name1', df.index, np.nan),
                      index = df.index).fillna(method='ffill').astype(int)
df = df.pivot(index='ndx', columns='0', values='1')

DataFramed · Accepted Answer · 2019-05-16T10:53:14.897

Here you go:

Original DataFrame

STEP1: Group the data by '1st column'

df_temp = df.groupby(0)[1].apply(list)

STEP2: Get column names for new data frame:

col_names = df_temp.index

STEP3: Get row values and store it in a list:

row_values = df_temp.values.tolist()

STEP4: Make new data frame in desired format:

new_df = pd.DataFrame(row_values, columns=  col_names)
new_df = new_df.T.rename_axis(None, axis=1)
new_df = new_df.reset_index(drop=True)

jezrael · Answer 5 · 2019-05-16T11:49:30.703

1

Use DataFrame.set_index with GroupBy.cumcount for MultiIndex and reshape by Series.unstack:

df = df.set_index([df.groupby(0).cumcount(), 0])[1].unstack().rename_axis(None, axis=1)
print (df)
  col_name1 col_name2 col_name3
0      val1      val2      val3
1      val4      val5      val6

edited May 16 '19 at 11:49

answered May 16 '19 at 10:38

jezrael

822,522
95
1,334
1,252

Pandas split DataFrame when column name are in values

5 Answers5

Linked