0

I want to plot time-series data using MatPlotLib. The data is stored in csv format which I process to Pandas DataFrame using pd.read_csv(), which works fine. A data set comprises one time stamp column and around 10 value columns. I convert the time stamp (initially a string of format yyyy-MM-dd hh:mm:ss) via pd.to_datetime(dataFrame['TIMESTAMP'], format='%Y-%m-%d %H:%M:%S') to datetime.

To plot the data I use the following code (generation of sample data is not part of my code):

import matplotlib.pyplot as plt
import pandas as pd
import numpy as np
import seaborn as sns

N = 30
timestamps = pd.date_range('2020-01-16 8:00', periods=N, freq='72s')
# note: the original timestamps aren't evenly spaced, this is just data to test
dataFrame = pd.DataFrame({'TIMESTAMP': timestamps, 'Y1': np.random.normal(100, 30, N), 'Y2': np.random.normal(100, 30, N)})
acqFieldName = 'Y1'

fig = sns.pointplot(x='TIMESTAMP', y=acqFieldName, data=dataFrame, scale=0.75)
timestamps = dataFrame['TIMESTAMP'].dt.time
fig.axes.set_xticklabels(labels=timestamps, rotation=45)
plt.show()

Which results in the following:

Plot resulting from code above

Still, I would like to change the x-axis: the ticks are too dense, so I'd like to have - say - 10 ticks, and I'd like to see the time spend in minutes, format 'mm:ss'.

I tried the following:

fig = sns.pointplot(x='TIMESTAMP', y=acqFieldName, data=dataFrame, scale=0.75)
timestamps = dataFrame['TIMESTAMP'].dt.time

xmin = dataFrame['TIMESTAMP'][0]
xmax = dataFrame['TIMESTAMP'][len(dataFrame['TIMESTAMP']) - 1]

timeDiff: timedelta = xmax - xmin
customTicks = np.linspace(0., timeDiff.seconds, 10)
fig.axes.set_xticklabels(labels=customTicks, rotation=45)
fig.axes.set_xticks(customTicks)
plt.show()

Which results in the following:

enter image description here

obviously not what I want.

My problem would be solved if I could reduce the number of ticks formatted as time, or - better - if the points align with the ticks given as time spent.

Update: suggestion of Zaraki Kenpachi yields

    fig, ax = plt.subplots()
    ax.plot(dataFrame.set_index('TIMESTAMP'), dataFrame[acqFieldName])
    plt.show()

enter image description here


Working solution based on JohanC's answer:

for fileName in glob.glob('*.csv'):
    plt.close()
    # NOTE: CsvFileProcessor is a custom class doing the readout of CSV and conversion to pandas.DataFrame
    dataFrame, acqFieldName, settingParameterCount = CsvFileProcessor.processFile(fileName)

    fig, ax = plt.subplots()
    ax: plt.Subplot = sns.pointplot(x='TIMESTAMP', y=acqFieldName, data=dataFrame, scale=0.75, ax=ax)
    startTime = dataFrame['TIMESTAMP'][0]

    timeProgress = []

    for timeStamp in dataFrame['TIMESTAMP']:
        timePassed = timeStamp - startTime
        timeProgress.append(timePassed)

    custom_ticks = range(0, len(timeProgress), 5)
    timestamps = [f"{datetime.timedelta(seconds=timeProgress[t].seconds)}" for t in custom_ticks]

    # for manipulating the x-axis tick labels:
    # https://stackoverflow.com/questions/51105648/ordering-and-formatting-dates-on-x-axis-in-seaborn-bar-plot
    ax.axes.set_xticklabels(labels=timestamps, rotation=45)
    ax.axes.set_xlabel(xlabel="Processing Time")
    plt.title('Setting Parameters: ' + str(settingParameterCount))
    ax.axes.set_xticks(custom_ticks)
    outFileName = fileName.upper()
    outFileName = outFileName.replace('_DATA.CSV', '')
    outFileName = outFileName + '_READOUT.PNG'
    fig.tight_layout()
    #plt.savefig(outFileName)
    plt.show()

results in:

Final plot

WolfiG
  • 1,059
  • 14
  • 31

2 Answers2

2

The main confusion comes from this Seaborn point plot having the x ticks as numbers 0,1,2,... and creating some confusing labels for them.

To get what you want, you could just set ticks every, say, 5. And provide custom labels for them. Also add minor ticks to have one tick for every entry.

Demo code:

import matplotlib.pyplot as plt
from matplotlib.ticker import AutoMinorLocator
import pandas as pd
import numpy as np
import seaborn as sns

N = 30
timestamps = pd.date_range('2020-01-16 8:00:00', periods=N, freq='73s')
dataFrame = pd.DataFrame({'TIMESTAMP': timestamps, 'Y1': np.random.normal(100, 30, N), 'Y2': np.random.normal(100, 30, N)})

fig = sns.pointplot(x='TIMESTAMP', y='Y1', data=dataFrame, scale=0.75)

custom_ticks = range(0, len(dataFrame), 5) # ticks every 5
timestamps = [f"{dataFrame['TIMESTAMP'][t].minute:02}:{dataFrame['TIMESTAMP'][t].second:02}" for t in custom_ticks]

fig.axes.set_xticklabels(timestamps)
fig.axes.set_xticks(custom_ticks)
fig.axes.xaxis.set_minor_locator(AutoMinorLocator())

plt.tight_layout()
plt.show()

demo plot

JohanC
  • 71,591
  • 8
  • 33
  • 66
  • Thanks a lot this works. Only one thing: you use a fancy notation to generate the array of timestamps. Where / under which key word do I find documentation about this syntax? – WolfiG Jan 16 '20 at 15:45
  • 1
    That notation is called [`f-strings`](https://www.python.org/dev/peps/pep-0498/), new since Python 3.6. Usually they are handier and more readable than other ways to fill variables into strings. – JohanC Jan 16 '20 at 15:52
0

Try simple plot:

import matplotlib.pyplot as plt
fig, ax = plt.subplots()
ax.plot(df.set_index('TIMESTAMP'), df[acqFieldName])
plt.show()
Zaraki Kenpachi
  • 5,510
  • 2
  • 15
  • 38
  • Nope. code `fig, ax = plt.subplots() ax.plot(dataFrame.set_index('TIMESTAMP'), dataFrame[acqFieldName])` yields some strange plot (see amendment of initial post) – WolfiG Jan 16 '20 at 14:27
  • You probably mean `ax.plot(dataFrame['TIMESTAMP'], dataFrame[acqFieldName])` ` – JohanC Jan 16 '20 at 14:36
  • @JohanC: no, I mean `dataFrame.set_index('TIMESTAMP')` like Zaraki suggested. – WolfiG Jan 16 '20 at 15:07
  • 1
    No, I wanted to explain that @Zaraki probably meant `ax.plot(dataFrame['TIMESTAMP'], ...)` – JohanC Jan 16 '20 at 15:16