I need some help to calculate 4 metrics for my table. What is the correct way to constantly refer to the same string within the department?
I have been trying to solve this problem for a very long time, but I can not come to a logical solution. I have tried Partition from SQLalchemy, but my skills wasn't enough. I tried the def function, but functions are my weak point.
Therefore, I ask you for help with a solution or advice. Thank you!!!
my real task is not about salaries)))) don't worry about it!
EXAMPLE:
import pandas as pd
df2=pd.DataFrame({
'head_department': [ 'Anna','Anna', 'Anna','Anna','Anna',
'John', 'John','John',
'Denis', 'Denis',
'Sarah',
'Greg', 'Greg','Greg','Greg', 'Greg','Greg'],
'serial_number': [1,2,3,4,5,1,2,3,1,2,1,1,2,3,4,5,6],
'Worker_in_dpt': ['Anna','Anna-s_worker_2','Anna-s_worker_3','Anna-s_worker_4','Anna-s_worker_5',
'John','John-s worker_2','John-s_worker_3',
'Denis','Denis-s_worker_2',
'Sarah',
'Greg','Greg-s_worker_2', 'Greg-s_worker_3', 'Greg-s_worker_4','Greg-s_worker_5', 'Greg-s_worker_6'],
'jan_salary': [1000, 600, 600, 500, 900, None, 600, 500, 1200, 800, 1400, None, 700, 600, 600, 450, 700],
'feb_salary': [1100, 700, 700, 700, 800, None, 500, 400, 1800, 900, 1000, 1400, 900, 800, 500, 450, 700],
'mar_salary': [1200, 800, 800, 900, 700, 1300, 600, 500, 1800, 600, 1100, 1600, 400, 700, 600, 250, 700],
'apr_salary': [1600, 900, 900, 700, 700, 2300, 500, 400, 1800, 900, 1100, 1900, 200, 900, 500, 150, 700],
'may_salary': [1100, 700, 700, 700, 800, 2300, 500, 400, 1800, 900, 1000, 1400, 900, 800, 500, 450, 700],
'jun_salary': [1200, 800, 800, 900, 700, 1300, 500, 400, 1800, 900, 1000, 1400, 900, 800, 500, 450, 800],
'jul_salary': [1000, 600, 600, 500, 900, 1300, 600, 500, 1200, 800, 1400, 1200, 700, 600, 600, 450, 700],
'aug_salary': [1100, 700, 700, 700, 800, 2300, 600, 500, 1800, 600, 1100, 1600, 400, 700, 600, 250, None]
})
df2['serial_number'] = df2['serial_number'].astype('str') # else it computed as int, when i use numeric_only=True
df2[ 'mean_salary_per_period'] = df2.mean(numeric_only=True, axis=1)
df2
enter image description here
i'm trying calculate columns:
- df2['corr_with_head_in_dpt'] =
- df2['mean _when_1_in_NAN'] =
- df2['mean _when_1_not_in_NAN'] =
- df2['sum_all_dpt_in_jan'] =
WHERE:
df2['corr with head in dpt'] - Calculation of the correlation of the salary of each worker for the entire period within the department with the head of this department (1), the flag of the head of the department is always = 1!
df2['mean _when_1_in _NAN'] - the average salary of each worker for the period while the head of department is NAN. if the head of department has no NAN, then the average for the all period.
df2['mean _when_1_in_NAN'] - the average salary of each worker for the period starting from the first month, when the salary appeared at the head of the department. if the head of department had no NA then the average for the all period.
df2['sum all dpt in Jan'] - the sum of all workers department salaries in January, including the head of department's, even it's NAN too.
Thank you!!!