I have a big csv file with dates set as index (e.g. 2018-10-31 12:00) and multiple columns, 1 column 'orientation' lists room IDS (sting, e.g. N1001) and another column 'values' lists numeric values (float). What I would like to do is getting the average value per 1 day of each of the rooms in order to plot a ranking. This is what I have so far:
import numpy as np
import pandas as pd
import matplotlib as mpl
import matplotlib.pyplot as plt
df = pd.read_csv('file.csv', index_col=0, parse_dates=['TS_TIMESTAMP'])
df.set_index('TS_TIMESTAMP', inplace=True)
#only datetimes between 07:00 and 18:00
df1 = df.between_time('07:00', '18:00')
c = df1.pivot(columns='Orientation', values='Quality_Value')
#mean of ppm per room
c.loc['mean'] = c.mean()
#select row 'mean' to make ranking
rk = c.ix['mean']
#make series into frame
data = pd.DataFrame({'Orientation':rk.index, 'Mean_Value':rk.values})
ranked = data.sort_values('Mean_Value', ascending=True)
# Draw plot
fig, ax = plt.subplots(figsize=(16,10), dpi= 80)
ax.vlines(x=ranked.Orientation, ymin=0, ymax=ranked.Mean_Value, color='firebrick', alpha=0.7, linewidth=2)
ax.scatter(x=ranked.Orientation, y=ranked.Mean_Value, s=20, color='firebrick', alpha=0.7)
# Title, Label, Ticks and Ylim
ax.set_title('ppm ranking D18', fontdict={'size':22})
ax.set_ylabel('CO2 Concentration in ppm')
ax.set_xticks(ranked.Orientation)
ax.set_xticklabels(ranked.Orientation.str.upper(), rotation=30, fontdict={'horizontalalignment': 'right', 'size':12})
ax.set_ylim(0, 1400)
# Annotate
for row in ranked.itertuples():
ax.text(row.Orientation, row.Mean_Value, s=round(row.Mean_Value, 1), horizontalalignment= 'center', verticalalignment='baseline', fontsize=9)
plt.show()
What I get looks like this: Ranking Plot
The problem is that I get a DeprecationWarning: .ix is deprecated. and also the mean value is calculated for the entire time frame (1month), but I want to have to have mean values per day.
Any help/advice is much appreciated. Thank you.
This is a sample how the csv is built up (pd.read_csv):
TS_TIMESTAMP TS_ID Orientation AirQuality_Value
2018-10-13 07:45:00 1284.0 NR1022 535.0000000000001
2018-10-14 07:15:00 1329.0 ER1051 777.0
2018-10-25 22:15:00 1260.0 WR1032 794.0
2018-10-17 06:00:00 1321.0 ER1052 836.0
2018-10-09 04:15:00 1382.0 ER1067 922.0
Copy this to the clipboard and read it with:
df = pd.read_clipboard(header=0, index_col=[0, 1])
df = df.drop('AirQuality_Value', axis=1)
df.columns = ['TS_ID', 'Orientation', 'AirQuality_Value']