Need help to solve performance problems of matplotlib on a Raspberry Pi

Question

First off, sorry for this lenghty text. I'm new to python and matplotlib, so please bear with me.

As a followup to this question I found the way of generating the grid to be quite time consuming on a Raspberry Pi using web2py. I have a csv file with round about 12k lines looking like this:

1;1.0679759979248047;0.0;147.0;0.0;;{'FHR1': 'US', 'FHR2': 'INOP', 'MHR': 'INOP'};69;good;;;;1455891539.502167

The thing is, that reading those 12k lines with numpy.genfromtxt already takes 30 something seconds. Populating the chart then (without the fancy grids) took another 30 seconds, just using columns 1, 3 and 7 of that csv. But after adding the solution time exploded to 170 seconds. So now I have to figure out what to do to reduce time consumption to somewhere under a minute.

My first thought is to eliminate the csv - I'm the one reading the data anyway, so I could skip that by either keeping the data in memory or by just writing it into the plot right away. And that's what I did, with a (in my mind) simple test layout and using the pdf backend. I chose to write the data into the chart every time I get them and save the chart once the transmission is done. I thought that should work fine, well it doesn't. I keep getting ludicrous errors:

RuntimeError: RRuleLocator estimated to generate 9178327 ticks from 0001-01-01 15:20:31.883239+00:00 to 0001-04-17 20:52:39.779205+00:00: exceeds Locator.MAXTICKS * 2 (6000000)

And believe me, I keep increasing those maxticks with every test run to top the number the error message says. Its ridiculous because that message is just for 60 seconds of data, and I want to go somewhere near 24 hours of data. I would either like the RRuleLocator to stop estimating or to just shut up and wait for the data to end. I really don't think I can make an MCVE out of this, but I can carve out the details I'm most likely messing up.

First off, I got some classes set up, so no globals. To simplify I have a communications class, that reads the serial port at one second intervals. This is running fine and up till yesterday wrote whatever came in on the serial port into a csv. Now I wanted to see if I could populate the chart while getting the data, and just save it, once I'm done. So for testing I added this to my .py

import matplotlib
matplotlib.use('PDF')    # I want a PDF in the end
from matplotlib import dates
import matplotlib.pyplot as plt
import numpy as np
from numpy import genfromtxt

Then some members to the communication class, that come from the charting part, mainly above mentioned solution. I initialize them in the classes init

    self.fig = None
    self.ctg = None
    self.toco = None

then I have this method I call, once I feel the data I'm receiving is in correct form/state so that the chart may be prepared for populating with data

def prepchart(self):
    # how often to show xticklabels and repeat yticklabels:
    print('prepchart')
    xtickinterval = 5

    hfmt = dates.DateFormatter('%H:%M:%S')
    self.fig = plt.figure()

    self.ctg = self.fig.add_subplot(2, 1, 1)  # two rows, one column, first plot
    plt.ylim(50, 210)

    self.toco = self.fig.add_subplot(2, 1, 2)
    plt.ylim(0, 100)
    # Set the minor ticks to every 30 seconds
    minloc = dates.SecondLocator(bysecond=[0,30])
    minloc.MAXTICKS = 3000000 
    self.ctg.xaxis.set_minor_locator(minloc)
    # self.ctg.xaxis.set_minor_locator(dates.MinuteLocator())
    self.ctg.xaxis.set_major_formatter(hfmt)

    self.toco.xaxis.set_minor_locator(dates.MinuteLocator())
    self.toco.xaxis.set_major_formatter(hfmt)

    # actg.xaxis.set_ticks(rotation=45)
    plt.xticks(rotation=45)

Then every so often once I have data I want to plot I'll do this in my data processing method

 self.ctg.plot_date(timevalue, heartrate, '-')
 self.toco.plot_date(timevalue, toco, '-')

finally once no more data is sent (this can be after one minute or 24 hours) I'll call

    def handleCTG(self):
        self.fig.savefig('/home/pi/test.pdf')

In conclusion: Am I going at this completely wrong or is there just an error in my code? And is this really a way to reduce waiting time for the chart to be generated?

I have an idea about how this should work, but how to do it I don't know. The idea is, that given the fact, that I want a fixed size for the axes (1cm == 60s) it should be no problem to just render the pieces and glue them together one after another in aforementioned intervals. Thus spreading out rendering time. While reading from the serial port the program is waiting most of the time (at least 85% idle time) anyway, it could use that time to do something else. — Sherlock70, Feb 25 '16 at 14:14

score 1 · Accepted Answer · edited May 23 '17 at 12:15

OK, so here's the deal. Obviously web2py runs a pretty tight ship. Meaning that there are not so many threads floating around, and it sure wont start a new thread for my little chart creation. I sort of noticed this, when I followed CPU usage on my Raspis taskmanager and only ever saw something around 25%. Now the Raspberry Pi has 4 kernels... go do the math. First I ran my script outside of web2py on my Raspi and, lo and behold, the entire thing including csv-reading and chart rendering only takes 20s. From there on (inspired by How to run a task outside web2py and retrieve the output) it's a piece of cake: use the well documented subprocess to call a new python with this script and done. So thanks to anyone who gave this some thought.

Need help to solve performance problems of matplotlib on a Raspberry Pi

1 Answers1