0

I have 3 Numpy Arrays like this :

Name Subject Marks

A    Math     89
B    Chem     43
A    Math     98
B    Math     23
A    Chem     57
B    Math     78
B    Math     82
A    Chem     71
A    Math     36
C    Math     89

What I would like to do is to get the average for each of the distinct cases based on columns one and two here, namely : A Math, A Chem, B Math, B Chem, C Math. ( something like the sql: avg(marks) group by name and subject ).

I have tried a lot but in vain, how to do this using only Numpy (any functions can be used) without using Pandas?

  • Any reason why you donot want to use pandas. Such a task is easily done in pandas. Numpy is mostly for numerical computation – Piyush Singh Dec 01 '19 at 19:28
  • Also, can you show your effort please. You should have tried something in advance. – AnsFourtyTwo Dec 01 '19 at 19:51
  • I'll second what @PiyushSingh said, this is not a NumPy task. Might as well use plain python lists. Why do you want to use numpy and not Pandas? – AMC Dec 01 '19 at 19:58
  • Using `numpy` may not be efficient as others have commented: you can check https://stackoverflow.com/a/11989425/5916727 – niraj Dec 01 '19 at 20:18

1 Answers1

3
import numpy as np
name=np.array(['A','B','A','B','A','B','B','A','A','C'])
subject=np.array(['Math','Chem','Math','Math','Chem','Math','Math','Chem','Math','Math'])
marks=np.array([89,43,98,23,57,78,82,71,36,89])
name_un=np.unique(name)
subj_un=np.unique(subject)
for nm in name_un:
    for subj in subj_un:
        arr=np.array([(True if x==nm and y==subj else False) for x,y in zip(name,subject)])
        if arr.any():
            print(nm,subj,np.mean(marks[arr]))
Haiz
  • 134
  • 4