Matplotlib problem: tick re-mapping of x-axis of time-series data

Question

I want to plot time-series data using MatPlotLib. The data is stored in csv format which I process to Pandas DataFrame using pd.read_csv(), which works fine. A data set comprises one time stamp column and around 10 value columns. I convert the time stamp (initially a string of format yyyy-MM-dd hh:mm:ss) via pd.to_datetime(dataFrame['TIMESTAMP'], format='%Y-%m-%d %H:%M:%S') to datetime.

To plot the data I use the following code (generation of sample data is not part of my code):

import matplotlib.pyplot as plt
import pandas as pd
import numpy as np
import seaborn as sns

N = 30
timestamps = pd.date_range('2020-01-16 8:00', periods=N, freq='72s')
# note: the original timestamps aren't evenly spaced, this is just data to test
dataFrame = pd.DataFrame({'TIMESTAMP': timestamps, 'Y1': np.random.normal(100, 30, N), 'Y2': np.random.normal(100, 30, N)})
acqFieldName = 'Y1'

fig = sns.pointplot(x='TIMESTAMP', y=acqFieldName, data=dataFrame, scale=0.75)
timestamps = dataFrame['TIMESTAMP'].dt.time
fig.axes.set_xticklabels(labels=timestamps, rotation=45)
plt.show()

Which results in the following:

Still, I would like to change the x-axis: the ticks are too dense, so I'd like to have - say - 10 ticks, and I'd like to see the time spend in minutes, format 'mm:ss'.

I tried the following:

fig = sns.pointplot(x='TIMESTAMP', y=acqFieldName, data=dataFrame, scale=0.75)
timestamps = dataFrame['TIMESTAMP'].dt.time

xmin = dataFrame['TIMESTAMP'][0]
xmax = dataFrame['TIMESTAMP'][len(dataFrame['TIMESTAMP']) - 1]

timeDiff: timedelta = xmax - xmin
customTicks = np.linspace(0., timeDiff.seconds, 10)
fig.axes.set_xticklabels(labels=customTicks, rotation=45)
fig.axes.set_xticks(customTicks)
plt.show()

Which results in the following:

obviously not what I want.

My problem would be solved if I could reduce the number of ticks formatted as time, or - better - if the points align with the ticks given as time spent.

Update: suggestion of Zaraki Kenpachi yields

    fig, ax = plt.subplots()
    ax.plot(dataFrame.set_index('TIMESTAMP'), dataFrame[acqFieldName])
    plt.show()

Working solution based on JohanC's answer:

for fileName in glob.glob('*.csv'):
    plt.close()
    # NOTE: CsvFileProcessor is a custom class doing the readout of CSV and conversion to pandas.DataFrame
    dataFrame, acqFieldName, settingParameterCount = CsvFileProcessor.processFile(fileName)

    fig, ax = plt.subplots()
    ax: plt.Subplot = sns.pointplot(x='TIMESTAMP', y=acqFieldName, data=dataFrame, scale=0.75, ax=ax)
    startTime = dataFrame['TIMESTAMP'][0]

    timeProgress = []

    for timeStamp in dataFrame['TIMESTAMP']:
        timePassed = timeStamp - startTime
        timeProgress.append(timePassed)

    custom_ticks = range(0, len(timeProgress), 5)
    timestamps = [f"{datetime.timedelta(seconds=timeProgress[t].seconds)}" for t in custom_ticks]

    # for manipulating the x-axis tick labels:
    # https://stackoverflow.com/questions/51105648/ordering-and-formatting-dates-on-x-axis-in-seaborn-bar-plot
    ax.axes.set_xticklabels(labels=timestamps, rotation=45)
    ax.axes.set_xlabel(xlabel="Processing Time")
    plt.title('Setting Parameters: ' + str(settingParameterCount))
    ax.axes.set_xticks(custom_ticks)
    outFileName = fileName.upper()
    outFileName = outFileName.replace('_DATA.CSV', '')
    outFileName = outFileName + '_READOUT.PNG'
    fig.tight_layout()
    #plt.savefig(outFileName)
    plt.show()

results in:

@WolfiG I added some test data to your post. Feel free to improve. — JohanC, Jan 16 '20 at 14:42

score 2 · Accepted Answer · answered Jan 16 '20 at 15:13

The main confusion comes from this Seaborn point plot having the x ticks as numbers 0,1,2,... and creating some confusing labels for them.

To get what you want, you could just set ticks every, say, 5. And provide custom labels for them. Also add minor ticks to have one tick for every entry.

Demo code:

import matplotlib.pyplot as plt
from matplotlib.ticker import AutoMinorLocator
import pandas as pd
import numpy as np
import seaborn as sns

N = 30
timestamps = pd.date_range('2020-01-16 8:00:00', periods=N, freq='73s')
dataFrame = pd.DataFrame({'TIMESTAMP': timestamps, 'Y1': np.random.normal(100, 30, N), 'Y2': np.random.normal(100, 30, N)})

fig = sns.pointplot(x='TIMESTAMP', y='Y1', data=dataFrame, scale=0.75)

custom_ticks = range(0, len(dataFrame), 5) # ticks every 5
timestamps = [f"{dataFrame['TIMESTAMP'][t].minute:02}:{dataFrame['TIMESTAMP'][t].second:02}" for t in custom_ticks]

fig.axes.set_xticklabels(timestamps)
fig.axes.set_xticks(custom_ticks)
fig.axes.xaxis.set_minor_locator(AutoMinorLocator())

plt.tight_layout()
plt.show()

Thanks a lot this works. Only one thing: you use a fancy notation to generate the array of timestamps. Where / under which key word do I find documentation about this syntax? — WolfiG, Jan 16 '20 at 15:45
That notation is called [`f-strings`](https://www.python.org/dev/peps/pep-0498/), new since Python 3.6. Usually they are handier and more readable than other ways to fill variables into strings. — JohanC, Jan 16 '20 at 15:52

score 0 · Answer 2 · answered Jan 16 '20 at 13:34

0

Try simple plot:

import matplotlib.pyplot as plt
fig, ax = plt.subplots()
ax.plot(df.set_index('TIMESTAMP'), df[acqFieldName])
plt.show()

answered Jan 16 '20 at 13:34

Zaraki Kenpachi

5,510
2
15
38

Nope. code `fig, ax = plt.subplots() ax.plot(dataFrame.set_index('TIMESTAMP'), dataFrame[acqFieldName])` yields some strange plot (see amendment of initial post) – WolfiG Jan 16 '20 at 14:27
You probably mean `ax.plot(dataFrame['TIMESTAMP'], dataFrame[acqFieldName])` ` – JohanC Jan 16 '20 at 14:36
@JohanC: no, I mean `dataFrame.set_index('TIMESTAMP')` like Zaraki suggested. – WolfiG Jan 16 '20 at 15:07
1

No, I wanted to explain that @Zaraki probably meant `ax.plot(dataFrame['TIMESTAMP'], ...)` – JohanC Jan 16 '20 at 15:16

Matplotlib problem: tick re-mapping of x-axis of time-series data

2 Answers2