0

I have a large dataset (~30GB) that I want to visualize by looking at it scrolling past. A great example is the top graph in this video. My data is coming from CSV files.

What I have tried so far is importing the massive CSV files into a numpy array and using np.roll() to shift in a new column from the right side (like in the video) repeatedly until I hit the last column of the array (by calling np.roll() in the mpl.animation.FuncAnimation iterations. This takes a large amount of CPU, and a much larger amount of memory.

Any suggestion on how to approach this? I couldn't find very many examples online that could help me with this.

Ophir Carmi
  • 2,701
  • 1
  • 23
  • 42
snelltheta
  • 29
  • 7
  • Please provide your code and a nippet of a csv file. – Ophir Carmi Jul 21 '16 at 08:27
  • I don't believe my code would help. I'm looking for suggestions of how to approach it. Perhaps not even using code. I just know that trying to load in huge arrays and trying to animate them is not working well (very slow). Pseudocode of how to do this would be sufficient. – snelltheta Jul 22 '16 at 19:23

1 Answers1

1

here is some code from the mat plot lib tutorials.

import numpy as np
from matplotlib.lines import Line2D
import matplotlib.pyplot as plt
import matplotlib.animation as animation
import math



class Scope(object):
    def __init__(self, ax, maxt=2, dt=0.02):
        self.ax = ax
        self.dt = dt
        self.maxt = maxt
        self.tdata = [0]
        self.ydata = [0]
        self.line = Line2D(self.tdata, self.ydata)
        self.ax.add_line(self.line)
        self.ax.set_ylim(-.1, 1.1)
        self.ax.set_xlim(0, self.maxt)

    def update(self, y):
        lastt = self.tdata[-1]
        if lastt > self.tdata[0] + self.maxt:  # reset the arrays
            self.tdata = [self.tdata[-1]]
            self.ydata = [self.ydata[-1]]
            self.ax.set_xlim(self.tdata[0], self.tdata[0] + self.maxt)
            self.ax.figure.canvas.draw()

        t = self.tdata[-1] + self.dt
        self.tdata.append(t)
        self.ydata.append(y)
        self.line.set_data(self.tdata, self.ydata)
        return self.line,


def emitter(x=0):
    'return a random value with probability p, else 0'

    while True:
        if x<361:
            x = x + 1
            yield math.sin(math.radians(x))
        else:
            x=0
            x =x + 1
            yield math.sin(math.radians(x))

# Fixing random state for reproducibility
np.random.seed(19680801)


fig, ax = plt.subplots()
scope = Scope(ax)

# pass a generator in "emitter" to produce data for the update func
ani = animation.FuncAnimation(fig, scope.update, emitter, interval=10,
                              blit=True)

plt.show()

my suggestion is to build a generator that yields the next data set you want to display every time it is called. this way you wont need to load the whole file into memory. more on that here. replace emitter function with generator that will pull from you file. disadvantage of this is I don't believe the full array will be available in the plot.

Josh Duzan
  • 80
  • 1
  • 8