1

I am currently trying to find the average value for all rows in which the YearsCode Value is one thing. My goal is to make it be that I have a graph which has the average Salary for every amount of time in which someone says they have coded. For example, if I had a chart that had 5 rows in which the value of row x was n, then I would average all those values to one new column called 'Average Row with value n'.

The data which I am using is the results from the 2020 Stack Overflow Developers Survey.

import matplotlib.pyplot as plt
import numpy as np
import pandas as pd

df = pd.read_csv('survey_results_public.csv') 

%matplotlib inline

df_new = df.copy()
df_new = df_new.drop(['Age1stCode','CompTotal','Respondent', 'MainBranch', 'Hobbyist', 'Age', 'CompFreq', 'Country', 'CurrencyDesc', 'CurrencySymbol', 'DatabaseDesireNextYear', 'DatabaseWorkedWith', 'DevType', 'EdLevel', 'Employment', 'Ethnicity', 'JobFactors', 'JobSat', 'JobSeek', 'LanguageDesireNextYear', 'LanguageWorkedWith', 'MiscTechDesireNextYear', 'MiscTechWorkedWith', 'NEWCollabToolsDesireNextYear', 'NEWCollabToolsWorkedWith', 'NEWDevOps', 'NEWDevOpsImpt', 'NEWEdImpt', 'NEWJobHunt', 'NEWJobHuntResearch', 'NEWLearn', 'NEWOffTopic', 'NEWOnboardGood', 'NEWOtherComms', 'NEWOvertime', 'NEWPurchaseResearch', 'NEWPurpleLink', 'NEWSOSites', 'NEWStuck', 'OpSys', 'OrgSize', 'PlatformDesireNextYear', 'PlatformWorkedWith', 'PurchaseWhat', 'Sexuality', 'SOAccount', 'SOComm', 'SOPartFreq', 'SOVisitFreq', 'SurveyEase', 'SurveyLength', 'Trans', 'UndergradMajor', 'WebframeDesireNextYear', 'WebframeWorkedWith', 'WelcomeChange', 'WorkWeekHrs', 'YearsCodePro'], axis = 'columns')
df_new = df_new.dropna()

df_woman = df_new.drop(index=df_new[df_new['Gender'] != 'Woman'].index, inplace=True)
df_woman = df_new
df_woman = df_woman.drop(['Gender'], axis ='columns')

That is my code so far.

Victoria
  • 23
  • 4
  • Could you please share a sample of your data? I think you might want to use something like `df.groupby('YearsCode')['Salary'].mean()` which will return the average salary per years coded. – gofvonx Apr 16 '21 at 09:05
  • That worked! Thank you! Can you put that in as the answer? – Victoria Apr 16 '21 at 09:42
  • Great - I have put it as an answer. Feel free to mark it as accepted https://stackoverflow.com/help/someone-answers – gofvonx Apr 16 '21 at 09:49
  • How can I sort the results in the YearsCode list from average_salaries = df.groupby('YearsCode')['Salary'].mean(). I have tried average_salary.sort_values(), but that sorted the 'Salary' column. How can I specify for it to sort the 'YearsCode' column? – Victoria Apr 16 '21 at 09:56
  • I have updated my answer – gofvonx Apr 16 '21 at 10:00

1 Answers1

1

I think you might want to use

df.groupby('YearsCode')['Salary'].mean()

This will return the average salary per years coded.

To sort the result by 'YearsCode' you can use sort_index

df.groupby('YearsCode')['Salary'].mean().sort_index()

To sort the result by the average salary you can use sort_values

df.groupby('YearsCode')['Salary'].mean().sort_values('Salary')
gofvonx
  • 1,370
  • 10
  • 20