1

I am trying to select a range of data in a matplotlib plot and got stuck with the return values of SpanSelector:

from io import StringIO
import matplotlib.pyplot as plt
from matplotlib.widgets import SpanSelector
import numpy as np
import pandas as pd

# create data
csv_file = StringIO("""URMS[V];IRMS[A];P[W];FPLL[Hz];URange[V];IRange[A];S[VA];Q[var];LAMBDA[];UTHD[%];Timestamp
234.63;0.1802;0.0002E+03;49.995;300;5;0.0423E+03;0.0423E+03;0.004;1.20;09:01:16.000
234.56;0.1803;0.0003E+03;49.996;300;5;0.0423E+03;0.0423E+03;0.004;1.15;09:01:16.100
234.70;0.1807;0.0002E+03;49.997;300;5;0.0424E+03;0.0424E+03;0.004;1.15;09:01:16.200
234.50;0.1807;0.0002E+03;49.998;300;5;0.0424E+03;0.0424E+03;0.004;1.18;09:01:16.300
234.84;0.1805;0.0001E+03;49.998;300;5;0.0424E+03;0.0424E+03;0.004;1.18;09:01:16.400
234.57;0.1796;0.0003E+03;49.999;300;5;0.0421E+03;0.0421E+03;0.004;1.20;09:01:16.500
234.67;0.1809;0.0002E+03;49.999;300;5;0.0424E+03;0.0424E+03;0.004;1.25;09:01:16.600""")

# read CSV file
data = pd.read_csv(csv_file, delimiter=';')

# convert timestamp to datetime objekt
data['Timestamp'] = pd.to_datetime(data['Timestamp'])

# create plot
fig, ax = plt.subplots()
ax.plot(data['Timestamp'], data['P[W]'], label='Leistung')

def onselect(xmin, xmax):
    if data is not None:
        sel_start = pd.to_datetime(xmin)
        sel_end = pd.to_datetime(xmax)
        # filter data from selected range
        mask = (data['Timestamp'] >= sel_start) & (data['Timestamp'] <= sel_end)
        selected_subset = data.loc[mask]
        # calculate mean
        mean_value = selected_subset['P[W]'].mean()
        print(f"Mittelwert im selektierten Bereich: {mean_value:.2f} P[W]")

plt.xlabel('Zeit')
plt.ylabel('P[W]')
plt.title('Zeitlicher Verlauf Wirkleistung')
plt.legend()
plt.grid(True)
plt.xticks(rotation=45)
span = SpanSelector(ax, onselect, 'horizontal', useblit=True, rectprops=dict(alpha=0.5, facecolor='red'), span_stays=True)

plt.show()

Converting SpanSelectors xmin and xmax to pd datetime objects (line 29, 30 in the above example) does not work: Both sel_start and sel_end end up as same values. Clearly a sign that I am doing something totally wrong...

Any hint on how to circumvent the problem is gladly accepted.

And for what it's worth: python==3.9.2, matplotlib==3.3.4, and pandas==1.2.3

nohtyp
  • 11
  • 2

1 Answers1

0

Interesting problem you did encounter. It seems matplotlib internally uses a different number format for datetime, in which your xmin and xmax seemed to be the same, as the difference was super small.

I was able to fix your problem by adding a few lines. The dates module of matplotlib provides the method num2date to get something pandas can work with.

import matplotlib.dates as mdates
(...)

        sel_start = pd.to_datetime(mdates.num2date(xmin))
        sel_end = pd.to_datetime(mdates.num2date(xmax))

However, that method returns a timezone-aware datetime. That is why I had to add utc=True in the following line.

# convert timestamp to datetime objekt
data['Timestamp'] = pd.to_datetime(data['Timestamp'], utc=True)

I tested it and with these modifications onselect returned reasonable results.

Flow
  • 551
  • 1
  • 3
  • 9