Ranking values of complex csv file (using python, pandas)

Question

I have a big csv file with dates set as index (e.g. 2018-10-31 12:00) and multiple columns, 1 column 'orientation' lists room IDS (sting, e.g. N1001) and another column 'values' lists numeric values (float). What I would like to do is getting the average value per 1 day of each of the rooms in order to plot a ranking. This is what I have so far:

import numpy as np
import pandas as pd
import matplotlib as mpl
import matplotlib.pyplot as plt

df = pd.read_csv('file.csv', index_col=0, parse_dates=['TS_TIMESTAMP'])
df.set_index('TS_TIMESTAMP', inplace=True)
#only datetimes between 07:00 and 18:00
df1 = df.between_time('07:00', '18:00')
c = df1.pivot(columns='Orientation', values='Quality_Value')
#mean of ppm per room
c.loc['mean'] = c.mean()

#select row 'mean' to make ranking
rk = c.ix['mean']

#make series into frame
data = pd.DataFrame({'Orientation':rk.index, 'Mean_Value':rk.values})
ranked = data.sort_values('Mean_Value', ascending=True)
# Draw plot
fig, ax = plt.subplots(figsize=(16,10), dpi= 80)
ax.vlines(x=ranked.Orientation, ymin=0, ymax=ranked.Mean_Value, color='firebrick', alpha=0.7, linewidth=2)
ax.scatter(x=ranked.Orientation, y=ranked.Mean_Value, s=20, color='firebrick', alpha=0.7)

# Title, Label, Ticks and Ylim
ax.set_title('ppm ranking D18', fontdict={'size':22})
ax.set_ylabel('CO2 Concentration in ppm')
ax.set_xticks(ranked.Orientation)
ax.set_xticklabels(ranked.Orientation.str.upper(), rotation=30, fontdict={'horizontalalignment': 'right', 'size':12})
ax.set_ylim(0, 1400)

# Annotate
for row in ranked.itertuples():
    ax.text(row.Orientation, row.Mean_Value, s=round(row.Mean_Value, 1), horizontalalignment= 'center', verticalalignment='baseline', fontsize=9)

plt.show()

What I get looks like this: Ranking Plot

The problem is that I get a DeprecationWarning: .ix is deprecated. and also the mean value is calculated for the entire time frame (1month), but I want to have to have mean values per day.

Any help/advice is much appreciated. Thank you.

This is a sample how the csv is built up (pd.read_csv):

TS_TIMESTAMP TS_ID Orientation AirQuality_Value
2018-10-13 07:45:00 1284.0 NR1022 535.0000000000001
2018-10-14 07:15:00 1329.0 ER1051 777.0
2018-10-25 22:15:00 1260.0 WR1032 794.0
2018-10-17 06:00:00 1321.0 ER1052 836.0
2018-10-09 04:15:00 1382.0 ER1067 922.0

Copy this to the clipboard and read it with:

df = pd.read_clipboard(header=0, index_col=[0, 1])
df = df.drop('AirQuality_Value', axis=1)
df.columns = ['TS_ID', 'Orientation',  'AirQuality_Value']

Yes, `ix` is deprecated. The warning message and other helpers will give you hints for how to replace it. And could you please post a small sample dataframe? Best would be the outcome of `pd.read_csv`, for example the first 5-10 lines. — JE_Muc, Feb 07 '19 at 10:15
You are welcome. Could you please add that sample as an executable code sample? I can't extract the data from an image... — JE_Muc, Feb 07 '19 at 10:58
A short hint for posting Dataframes: You can read the csv, extract the first 5 lines with `df_extr = df.head(5)`, then copy it to the clipboard with `df_extr.to_clipboard()` and paste this into your post with `Ctrl + V`. — JE_Muc, Feb 07 '19 at 11:13
Is this what you meant: TS_TIMESTAMP TS_ID Orientation AirQuality_Value 2018-10-13 07:45:00 1284.0 NR1022 535.0000000000001 2018-10-14 07:15:00 1329.0 ER1051 777.0 2018-10-25 22:15:00 1260.0 WR1032 794.0 2018-10-17 06:00:00 1321.0 ER1052 836.0 2018-10-09 04:15:00 1382.0 ER1067 922.0 — Lis011235, Feb 07 '19 at 11:54

Ranking values of complex csv file (using python, pandas)

0 Answers0