0

I am working on a big dataframe where each row includes the values of various signals and all the rows should visualize based on their signals. Controlling this process on the single core takes a huge amount of time. Therefore I wanted to split the dataset into multiple cores for accelerating the plotting process.

I have a class that is inherited from FigureCanvasBase and includes a matplotlib figure. Objects from this class are generated from multiple different processes and then added to the layout for shown in pyqt5-based GUI. It was working when I inherited it from FigureCanvas but due to using multiprocess, I can use FigureCanvasBase but not FigureCanvas anymore.

class Canvas(FigureCanvasBase):
    def __init__(self, x, y):
        fig = Figure(figsize=(5, 3))
        super().__init__(fig)
        self.figure = fig

        ax = self.figure.subplots()
        for key in y.keys():
            ax.plot(x, [abs(number) for number in y[key]])

        self.figure.tight_layout()

def generate_class_func(list_of_dfs, x, y):     
    list_of_custom_classes = list()
        for df in list_of_dfs:
        canvas = Canvas(df, x, y)
        list_of_custom_classes.append(canvas)
    return list_of_custom_classes

import multiprocess as mp
with mp.Pool() as p:
    from itertools import repeat
    list_of_classes_list = p.starmap(generate_class_func, [[0,{a:1}],[2,{b:3}],[4,{c:5}],[6,{d:7}]])
    p.close()
    p.join()

canvas = Canvas([1,2], [a:3,b:4])
layout = QHBoxLayout()
layout.addWidget(canvas)

>>> {TypeError}addWidget(self, QWidget, stretch: int = 0, alignment: Union[Qt.Alignment, Qt.AlignmentFlag] = 0): argument 1 has unexpected type 'Canvas'

Any recommendations for adding Canvas objects to layouts?

  • FigureCanvasBase is the *base* class that each backend then implements in their own FigureCanvas subclass. You cannot obviously use it directly. Your relation between having to use the base class and multiprocessing is completely obscure and unclear. Please explain why you made that assumption. – musicamante Aug 05 '22 at 10:29
  • @musicamante thank you for your comment. Now I updated the code and question based on your suggestions. Please let me know if you need more info. – justRandomLearner Aug 05 '22 at 11:56
  • If I'm understanding this correctly, you're trying to use multiprocessing to speed up the canvas creation. Unfortunately, this won't work, because canvasses are UI objects and as such cannot be pickled (which is a requirement for multiprocessing), which is probably the reason for which you couldn't use the standard FigureCanvas class. – musicamante Aug 05 '22 at 16:40
  • @musicamante I see, thanks for your comment. Last but not least, is there any possibility to convert figurecanvasbase to figurecanvas? Maybe I could create a list of figurecanvasbases and I can convert them to figurecanvases on the same process with GUI. – justRandomLearner Aug 08 '22 at 09:42
  • All FigureCanvas are subclasses that implement the graphical (and interaction) parts of the base class for the respective toolkits used. As said, complex objects are not pickable, and even if you get to implement that, the unpickling (which would happen in the main process and thread) would most certainly make completely useless the mp optimization. I don't know matplotlib that much, maybe it *is* possible to pickle the Artists (axis, etc) that would make the plot, but that seems rather complex to achieve. A basic question: why are you actually trying to do so? Is mp so much essential? – musicamante Aug 08 '22 at 19:23
  • I mean, based on your code, the only "waiting time" is during initialization (which is normally acceptable). Is that speeding up at startup so essential for your program? Or it was just a basic code, and in your program you may have to do *lots* of complex and different plots during runtime? – musicamante Aug 08 '22 at 19:25
  • @musicamante I am receiving the dataset from experiments and each dataset may include thousands of rows which include multiple data points. What I am trying to do is the user can upload any dataset to GUI and they can visualize points of plots on each row. If they work on a big dataset it takes time to visualize all the rows and put them on the QMainWindow (e.g. 10000 rows took 80 mins to plot all points). Therefore, I thought if I split the dataset into subprocesses and each subprocess creates a list of plots of dataframe and when they return the lists, I could put all of them to QMainWindow. – justRandomLearner Aug 09 '22 at 10:47
  • Well, with *that* amount of data, it's tricky. Maybe get some idea from this? https://matplotlib.org/stable/gallery/misc/multiprocess_sgskip.html – musicamante Aug 09 '22 at 19:26
  • 1
    @musicamante As you said it looks like artists of figure can be sharable to subprocesses but figures must be on the same process with GUI and I don't think this method will accelerate the process since there will be a bottleneck. Seems this software design will not work. Thanks for your effort and answers so far! – justRandomLearner Aug 10 '22 at 14:48

0 Answers0