1

I am planning to implement the below logic to get the student score.

Find students getting more than 60 marks

And then get that student marks too based on Subject Student Key!

Input Data

data = [['Maths', 100,80,20], ['Science', 80,20,10]] 
df = pd.DataFrame(data, columns = ['Subject', 'Student A','Student B','Student C'])
df.set_index("Subject",inplace=True)


         Student A  Student B   Student C
Subject         
Maths      100       80         20
Science     80       20         10

Get Students getting more than 60 marks

df=df[df.gt(60)]
rank_df = df.rank(axis=0,method='average',pct=False,ascending=False)
marks_list = []
for i in range(0,len(rank_df)):
    label_series = rank_df.iloc[i,:]
    labels_notna = label_series.sort_values(ascending=True)[label_series.notna()].index
    marks_list.append(",".join(labels_notna))
df['Student gt 60'] = marks_list


new_df = df['Student gt 60'].str.split(',', expand = True)
new_df.reset_index(inplace=True)
new_df.columns=["Subject","Top 1","Top 2"]
new_df = pd.melt(new_df, id_vars=['Subject'], value_name='Student')
data= new_df[["Subject","Student"]]
data.loc[~data["Student"].isna()]

    Subject Student
0   Maths   Student A
1   Science Student A
2   Maths   Student B

I am planning to get an associated score for the Subject/Student key in the same data frame but not able to work it out.

Required Output:

Subject Student Score
0   Maths   Student A   100
1   Maths   Student B   80
2   Science Student A   80

Could somebody help me with some pointers!

Sam
  • 336
  • 6
  • 16

2 Answers2

2

I recommend stacking the dataframe first to get a MultiIndex Series (with the subject in the first level and the students in the second level) and then index this Series selecting all students with a sufficient score:

df_stacked = df.stack()
df_stacked[df_stacked.gt(60)]
# Out: 
# Subject           
# Maths    Student A    100
#          Student B     80
# Science  Student A     80
# dtype: int64
JE_Muc
  • 5,403
  • 2
  • 26
  • 41
  • Thank you. I made my logic so complicated. Never thought it was so easy – Sam Oct 14 '20 at 11:43
  • 1
    You are welcome! Yeah, pandas is a great tool with many easy and straightforward methods for reshaping, pivoting etc. And honestly, even after several years of intense pandas usage, I still have to look up the functionality of pivot, melt etc regularly. :) – JE_Muc Oct 14 '20 at 11:46
2

First, orient your data the way you eventually want it:

vertical = df.unstack()

That gives you:

           Subject
Student A  Maths      100
           Science     80
Student B  Maths       80
           Science     20
Student C  Maths       20
           Science     10

Then simply:

vertical[vertical > 60]    

Gives you the final result:

           Subject
Student A  Maths      100
           Science     80
Student B  Maths       80

You can do reset_index() on that to make it look more like your example output.

John Zwinck
  • 239,568
  • 38
  • 324
  • 436