I am performing an EDA just to practice and learn a bit more about coding.
The problem is this:
I would like to calculate the mean of a column 'AQI' that stands for 'Air Quality Index' but only for specific values in index. My index is 'City' which has 26 different values (city names).
I have already calculated the means of all columns based on the city, for example:
Shillong_means = df.loc[df['City'] == 'Shillong', ['PM2.5', 'PM10', 'NO', 'NO2', 'NOx', 'NH3', 'CO', 'SO2',
'O3', 'Benzene', 'Toluene', 'AQI', 'AQI_Bucket_num']].mean() # Columns mean for Shillong city
This is the mean for every numeric columns with respect to Shillong City.
I have 25 lines like this (for other cities), but they are not together in a list (I think it would be easier to do if the were -- like, a list with only cities AQI_mean values)
So, basically, I am trying to create an combo chart, where in the x-axis I have the cities (26 different bar graphs), y-axis with AQI_mean values for each city, and a line plot with the national_mean (which I have already calculated) as follows:
national_AQI_mean = df['AQI'].mean()
national_AQI_mean
I have also tried to create a list containing the AQI_means for each city in order to plot a barplot.
Cities_AQI_list = list[Ahmedabad_means['AQI'], Aizawl_means['AQI'], Amaravati_means['AQI'], Amritsar_means['AQI'],
Bengaluru_means['AQI'], Bhopal_means['AQI'], Brajrajnagar_means['AQI'], Chandigarh_means['AQI'],
Chennai_means['AQI'], Coimbatore_means['AQI'], Delhi_means['AQI'], Ernakulam_means['AQI'],
Gurugram_means['AQI'], Guwahati_means['AQI'], Hyderabad_means['AQI'], Jaipur_means['AQI'],
Jorapokhar_means['AQI'], Kochi_means['AQI'], Kolkata_means['AQI'], Lucknow_means['AQI'],
Mumbai_means['AQI'], Patna_means['AQI'], Shillong_means['AQI'], Talcher_means['AQI'],
Thiruvananthapuram_means['AQI'], Visakhapatnam_means['AQI']]
plt.bar(df['City'].unique().tolist(), Cities_AQI_list)
But it returns me an error: unsupported operand type(s) for +: 'int' and 'types.GenericAlias'
I don't know why the list I created is generic Alias type. Shouldn't be "list"? (Information about this would be really nice as well)
My df length is 29531: 26 cities and daily record of gases emission during 5 years (not for all cities).
My goal is to plot a combo chart where x-axis is divided categorically into cities, y-axis is AQI_mean values, and the line is just a straight line in the with the national_AQI_mean value.
PS. just came up with another idea: I will try to groupby my df by cities. (Let me know if it is a good or bad idea).