2

I am trying to extract only the relevant information from a dataframe. My data looks like

import pandas as pd
import numpy as np

df = pd.DataFrame({'ID': {0: 'id1', 1: 'id1', 2: 'id1'},
                   'EM': {0: 'met1', 1: 'met2', 2: 'met3'},
                   'met1_AVG': {0: 0.38, 1: np.nan, 2: np.nan},
                   'met2_AVG': {0: np.nan, 1: 0.2, 2: np.nan},
                   'met3_AVG': {0: np.nan, 1: np.nan, 2: 0.58},
                   'score': {0: 89, 1: 89, 2: 89}})

My desired output is this is my desired output

Please, find my code below. I really would appreciate if someone could help me out. Thank you in advance for your time and helpful assistance

df_melted = df.melt(id_vars=['ID','EM','score']).dropna(subset=['value'])
df_pivoted = pd.pivot_table(data=df_melted,index=['ID','score'],columns=['variable'])
df_ready = df_pivoted.reset_index()
df_ready
Amilovsky
  • 397
  • 6
  • 15

1 Answers1

1

Assuming the score is always same, you can use pandas.DataFrame.groupby.first:

df.drop("EM",axis=1).groupby("ID", as_index=False).first()

Output:

    ID  met1_AVG  met2_AVG  met3_AVG  score
0  id1      0.38       0.2      0.58     89
Chris
  • 29,127
  • 3
  • 28
  • 51