0

I have one dataframe that is not well formatted, it look like

0  1
col_name1     val1
col_name2     val2
col_name3     val3
col_name1     val4
col_name2     val5
col_name3     val6
.  .             .
.  .             .

and I wanted to make it look like

col_name1,col_name2,col_name3
val1,val2,val3
val4,val5,val6

How can I split it that way?

I tried to transpose the dataframe and didn't work the same for some groupby manipulation.

marc_s
  • 732,580
  • 175
  • 1,330
  • 1,459
Ismail
  • 53
  • 9

5 Answers5

1

You can use:

m=df.groupby('0')['1'].apply(list)
df1=pd.DataFrame(m.values.tolist(),index=m.index).T.rename_axis(None,axis=1)
print(df1)

  col_name1 col_name2 col_name3
0      val1      val2      val3
1      val4      val5      val6
anky
  • 74,114
  • 11
  • 41
  • 70
1
new_df = {i:[] for i in list(set(df["0"]))}
for i in range(len(df)):
    new_df[df["0"][i]].append(df["1"][i])

Result
col_name2 col_name3 col_name1
0      val2      val3      val1
1      val5      val6      val4
1

If the order of rows is consistent, you can just pivot your dataframe after adding a new pseudo-index with int(index / 3):

df['ndx'] = (df.index / 3).astype(int)
df = df.pivot(index='ndx', columns='0', values='1')

If unsure, this will be more robust provided col_name1 comes always first:

df['ndx'] = pd.Series(np.where(df['0'] == 'col_name1', df.index, np.nan),
                      index = df.index).fillna(method='ffill').astype(int)
df = df.pivot(index='ndx', columns='0', values='1')
Serge Ballesta
  • 143,923
  • 11
  • 122
  • 252
1

Here you go:

Original DataFrame

enter image description here

STEP1: Group the data by '1st column'

df_temp = df.groupby(0)[1].apply(list)

STEP2: Get column names for new data frame:

col_names = df_temp.index

STEP3: Get row values and store it in a list:

row_values = df_temp.values.tolist()

STEP4: Make new data frame in desired format:

new_df = pd.DataFrame(row_values, columns=  col_names)
new_df = new_df.T.rename_axis(None, axis=1)
new_df = new_df.reset_index(drop=True)

enter image description here

DataFramed
  • 1,491
  • 16
  • 21
1

Use DataFrame.set_index with GroupBy.cumcount for MultiIndex and reshape by Series.unstack:

df = df.set_index([df.groupby(0).cumcount(), 0])[1].unstack().rename_axis(None, axis=1)
print (df)
  col_name1 col_name2 col_name3
0      val1      val2      val3
1      val4      val5      val6
jezrael
  • 822,522
  • 95
  • 1,334
  • 1,252