0

So i wanted to find the weight average of grades with thick as the weight. Problem is, i need to do it everytime the Grade shows >0 consecutively, and if it's 0 consecutively, i would still like to find it's weight average although its 0 for sure.

Below is the tables including the results that i wanted to find. As you can see there is several empty values / NaN on Total Thickness and WA (Weight Average). Each values above empty values is to represent the each group's WA that i need to find

Basically, i need to groupby based on it's grade (either it's all 0 or it's all >0) then i need to find the weight average of each of those groups.


BHID           Thick    Total Thickness Grade   WA
01/01/1970  2   2.5         0   0
01/01/1970  0.5             0   
01/01/1970  0.5 2.5         3.35    0.71
01/01/1970  2               0.05    
01/01/1970  0.8 1.5         0   0
01/01/1970  0.7             0   
01/02/1970  2   4           3.3 4.075
01/02/1970  0.5             2.8 
01/02/1970  0.5             0   
01/02/1970  1               8.3 
01/02/1970  1.7 2.3         0   0
01/02/1970  0.6             0   

note: you may notice on 01/02/1970, there is a single 0 on it's Grade but it is not consecutive so it doesn't count as a new group. So there is only 5 groups with 5 WA values (0, 0.71, 0, 4.075, 0)

I've tried to groupby using BHID, and managed to find wavg (weighted average) for each BHID, pretty good

               wavg
BHID                
1970-01-01  0.273077
1970-02-01  2.587302

but when i tried to groupby Grades, it surely just groupby unique grade values which is not what i was looking for. I was trying to groupby by Grade's ranges.

                  wavg
BHID       Grade      
1970-01-01 0.00   0.00
           3.35   3.35
           0.05   0.05
1970-02-01 3.30   3.30
           2.80   2.80
           0.00   0.00
           8.30   8.30

here is the code that i used

def my_agg(x):
    names = np.average(x['Grade'],weights=x['Thick'],axis=0)
    return pd.Series(names,index=['wavg'])

print(df1.groupby(['BHID','Grade'],sort=False,).apply(my_agg))

i just want to know how i can find the weight average from groupby based on Grade ranges.

Jayakusuma
  • 39
  • 4

1 Answers1

0

I got it, i had to utilize Machine Learning to classify (i used Sklearn Tree), and then i can finally use groupby according to the results of the Tree Classification.

def my_agg(x):
    names = np.average(x['Sn (kg/m3)'],weights=x['Thick'],axis=0)
    return pd.Series(names,index=['wavg'])

def my_thick(x):
    tot_thick = np.sum(x['Thick'],axis=0)
    return pd.Series(tot_thick,index=['Thick Sumz'])

Jayakusuma
  • 39
  • 4