1

I have written code to plot data from very large .txt files (20Gb to 60Gb). The .txt files contain two columns of data, that represent the outputs of two sensors from an experiment that I did. The reason the data files are so large is that the data was recorded at 4M samples/s. The code works well for plotting relatively small .txt files (10Gb), however when I try to plot my larger data files (60Gb) I get the following error message:

Attempted to access TIME(0); index must be a
positive integer or logical.

Error in textscan_loop (line 17)
  TIME =
  ((TIME(end)+sample_rate):sample_rate:(sample_rate*(size(d,1)))+(TIME(end)));%shift
  Time along

The basic idea behind my code is to conserve RAM by reading Nlines of data from .txt on disk to Matlab variable C in RAM, plotting C then clearing C. This process occurs in loop so the data is plotted in chunks until the end of the .txt file is reached. The code can be found below:

Nlines = 1e6; % set numbe of lines to sample per cycle
sample_rate = (1); %sample rate
DECE= 1000;% decimation factor

TIME = (0:sample_rate:sample_rate*((Nlines)-1));%first inctance of time vector
format = '%f\t%f';
fid = fopen('H:\PhD backup\Data/ONK_PP260_G_text.txt');

while(~feof(fid))

  C = textscan(fid, format, Nlines, 'CollectOutput', true);
  d = C{1};  % immediately clear C at this point you need the memory! 
  clearvars C ;
  TIME = ((TIME(end)+sample_rate):sample_rate:(sample_rate*(size(d,1)))+(TIME(end)));%shift Time along 
  plot((TIME(1:DECE:end)),(d(1:DECE:end,:)))%plot and decimate
  hold on;
  clearvars d;
end

fclose(fid);

I think the while loop does around 110 cycles before the code stops executing and the error message is displayed, I know this because the graph shows around 110e7 data points and the loop processes 1e6 data points at a time.

If anyone knows why this error might be occurring please let me know.

Cheers, Jim

James Archer
  • 421
  • 2
  • 6
  • 16
  • Never seen a txt file of 20+G size... – herohuyongtao Jan 14 '14 at 14:45
  • @herohuyongtao The .txt files contain two columns of data, that represent the outputs of two sensors from an experiment that I did. The reason the data files are so large is that the data was recorded at 4M samples/s – James Archer Jan 14 '14 at 14:49
  • The error message clearly does not suggest any issue with the `plot` command... Did you try running the function with the `plot` line commented out? It seems more like your `TIME` channel gets messed up... – sebastian Jan 15 '14 at 07:48
  • Please check whether running the code with `dbstop if error` helps and if not, please describe all relevant variables. I now suspect the error is just caused by a flaw in the code rather than the limitations of `plot`. – Dennis Jaheruddin Jan 15 '14 at 09:02
  • @sebastian I ran the code without the plot command and indeed the same error occurs. My TIME vector gets all crunked up it should be [1e6 x 1] but it ends up being [1 x 0] at the point or error. – James Archer Jan 16 '14 at 12:33
  • If your time `TIME` vector is empty then `size(d,1)` is zero - if I'm not mistaken. So there's probably an issue with textscan or your input-data... – sebastian Jan 16 '14 at 12:37
  • @DennisJaheruddin I used `dbstop if error` and found out that size(d,1) is zero, just as Sebastian suggested. From debugging C = 1x1 = [1000000x2 double], does this mean the problem is with `d=C{1};` or `C = textscan(fid, format, Nlines, 'CollectOutput', true);`. – James Archer Jan 16 '14 at 13:30
  • Without a description of **all** relevant variables and the **exact** line on which they occur it is nearly impossible to say something usefull about this. – Dennis Jaheruddin Jan 16 '14 at 13:38

1 Answers1

1

The error that you encounter is in fact not in the plotting, but in the line of reference.

Though I have been unable to reproduce the exact error, I suspect it to be related to this:

Time = 1:0
Time(end)

In any case, the way forward is clear. You need to run this code with dbstop if error and observe all relevant variables in the line that throws the error.

From here you will likely figure out what is causing the problem, hopefully just something simple like your code being unable to deal with data size that is an exact multiple of 1000 or so.


Trying to use plot for big data is problematic as matlab is trying to plot every single data point.

Obviously the screen will not display all of these points (many will overlap), and therefore it is recommended to plot only the relevant points. One could subsample and do this manually as you seem to have tried, but fortunately we have a ready to use solution for this:

The Plot (Big) File Exchange Submission

Here is the introduction:

This simple tool intercepts data going into a plot and reduces it to the smallest possible set that looks identical given the number of pixels available on the screen. It then updates the data as a user zooms or pans. This is useful when a user must plot a very large amount of data and explore it visually.

This works with MATLAB's built-in line plot functions, allowing the functionality of those to be preserved.

Instead of:

plot(t, x);

One could use:

reduce_plot(t, x);

Most plot options, such as multiple series and line properties, can be passed in too, such that 'reduce_plot' is largely a drop-in replacement for 'plot'.

h = reduce_plot(t, x(1, :), 'b:', t, x(2, :), t, x(3, :), 'r--*');

This function works on plots where the "x" data is always increasing, which is the most common, such as for time series.

Dennis Jaheruddin
  • 21,208
  • 8
  • 66
  • 122
  • @DanielR That may need to be one in 10000, or more I suppose. But more importantly, you would risk losing the interesting points and you will still definitely plot numerous overlapping points. – Dennis Jaheruddin Jan 14 '14 at 15:02
  • @DennisJaheruddin & Daniel R My largest file would have around 54e8 samples, by decimating it by a factor of 1000 that gives 54e5 samples to plot, which is indeed more than the number of pixels on my display (approximately 17.6e5). I'm not sure if this is the problem though, I will try _reduce_plot now and report back. – James Archer Jan 14 '14 at 15:34
  • @DennisJaheruddin I do not think the problem is to do with the amount of data points Matlab/computer screen can display. I used a decimation factor of 10000 and the code reached the same point of the plot as with a DECE of 1000. The reduce_plot function does not seem to display any data :-/ – James Archer Jan 14 '14 at 15:47
  • @JamesArcher The submission has a good [reputation](http://blogs.mathworks.com/pick/2013/06/07/plot-real-big/), so the first thing to check is whether you are using it properly. If you cannot figure it out consider posting a new question including (compactly formulated) example data, the original `plot` command that works, and the `reduced_plot` command that fails. – Dennis Jaheruddin Jan 14 '14 at 16:19
  • See my comment above, why do you think the error has anything to do with the `plot` function? – sebastian Jan 15 '14 at 07:46
  • @sebastian I completely missed the error message, and that explains why changing the sampling rate did not help. I have updated the answer to something I believe will help the asker. I have left the description of Plot (Big) in place as additional advice. – Dennis Jaheruddin Jan 15 '14 at 08:57