0

I am performing an EDA just to practice and learn a bit more about coding.

The problem is this:

I would like to calculate the mean of a column 'AQI' that stands for 'Air Quality Index' but only for specific values in index. My index is 'City' which has 26 different values (city names).

I have already calculated the means of all columns based on the city, for example:

Shillong_means = df.loc[df['City'] == 'Shillong', ['PM2.5', 'PM10', 'NO', 'NO2', 'NOx', 'NH3', 'CO', 'SO2',
       'O3', 'Benzene', 'Toluene', 'AQI', 'AQI_Bucket_num']].mean() # Columns mean for Shillong city

This is the mean for every numeric columns with respect to Shillong City.

I have 25 lines like this (for other cities), but they are not together in a list (I think it would be easier to do if the were -- like, a list with only cities AQI_mean values)

So, basically, I am trying to create an combo chart, where in the x-axis I have the cities (26 different bar graphs), y-axis with AQI_mean values for each city, and a line plot with the national_mean (which I have already calculated) as follows:

national_AQI_mean = df['AQI'].mean()
national_AQI_mean

I have also tried to create a list containing the AQI_means for each city in order to plot a barplot.

Cities_AQI_list = list[Ahmedabad_means['AQI'], Aizawl_means['AQI'], Amaravati_means['AQI'], Amritsar_means['AQI'], 
                      Bengaluru_means['AQI'], Bhopal_means['AQI'], Brajrajnagar_means['AQI'], Chandigarh_means['AQI'],
                      Chennai_means['AQI'], Coimbatore_means['AQI'], Delhi_means['AQI'], Ernakulam_means['AQI'],
                      Gurugram_means['AQI'], Guwahati_means['AQI'], Hyderabad_means['AQI'], Jaipur_means['AQI'],
                      Jorapokhar_means['AQI'], Kochi_means['AQI'], Kolkata_means['AQI'], Lucknow_means['AQI'], 
                      Mumbai_means['AQI'], Patna_means['AQI'], Shillong_means['AQI'], Talcher_means['AQI'],
                      Thiruvananthapuram_means['AQI'], Visakhapatnam_means['AQI']]




plt.bar(df['City'].unique().tolist(), Cities_AQI_list)

But it returns me an error: unsupported operand type(s) for +: 'int' and 'types.GenericAlias'

I don't know why the list I created is generic Alias type. Shouldn't be "list"? (Information about this would be really nice as well)

My df length is 29531: 26 cities and daily record of gases emission during 5 years (not for all cities).

My goal is to plot a combo chart where x-axis is divided categorically into cities, y-axis is AQI_mean values, and the line is just a straight line in the with the national_AQI_mean value.

PS. just came up with another idea: I will try to groupby my df by cities. (Let me know if it is a good or bad idea).

Pranav Hosangadi
  • 23,755
  • 7
  • 44
  • 70
  • Yes, your `groupby` idea is the correct way. `df.groupby("City").mean()` should give you the means you want. – Pranav Hosangadi Jan 12 '23 at 18:14
  • 1
    Does this answer your question? [pandas get average of a groupby](https://stackoverflow.com/questions/40066837/pandas-get-average-of-a-groupby) – Pranav Hosangadi Jan 12 '23 at 18:18
  • Yes! thank you so much! I was trying here. You literally solved the problem in one line for something I have written about 30 lines (at least). Thank you again. – Lucas Correa Jan 12 '23 at 18:26

0 Answers0