1

I have the following memory-speed problem in Matlab and I would like your help to understand whether there may be a solution.

Consider the following 4 big column vectors X1, X2, Y1, Y2.

clear 
rng default
P=10^8;
X1=rand(1,P)*5;
X2=rand(1,P)*5;
Y1=rand(1,P)*5;
Y2=rand(1,P)*5;

What I would like to do is a scatter plot where on the x-axis I have the sum between any possible two elements of X1 and X2 and on the y-axis I have the sum between any possible two elements of Y1 and Y2.

I post here three options I thought about that do not work mainly because of memory and speed issues.

Option 1 (issues: too slow when doing the loop, out of memory when doing vertcat)

Xtemp=cell(P,1);
Ytemp=cell(P,1);
for i=1:P
    tic
    Xtemp{i}=X1(i)+X2(:);
    Ytemp{i}=Y1(i)+Y2(:);
    toc
end
X=vertcat(Xtemp{:}); 
Y=vertcat(Ytemp{:});
scatter(X,Y)

Option 2 (issues: too slow when doing the loop, time increasing as the loop proceeds, Matlab going crazy and unable to produce the scatter even if I stop the loop after 5 iterations)

for i=1:P
    tic
    scatter(X1(i)+X2(:), Y1(i)+Y2(:))
    hold on 
    toc
end

Option 3 (sort of giving up) (issues: as I increase T the scatter gets closer and closer to a square which is correct; I am wondering though whether this is caused by the fact that I generated the data using rand and in option 3 I use randi; maybe with my real data the scatter does not "converge" to the true plot as I increase T; also, what is the "optimal" T and R?).

T=20;
R=500;
for t=1:T
    tic
    %select R points at random from X1,X2,Y1,Y2 
    X1sel=(X1(randi(R,R,1)));
    X2sel=(X2(randi(R,R,1)));
    Y1sel=(Y1(randi(R,R,1)));
    Y2sel=(Y2(randi(R,R,1)));
    %do option 1 among those points and plot
    Xtempsel=cell(R,1);
    Ytempsel=cell(R,1);
    for r=1:R
        Xtempsel{r}=X1sel(r)+X2sel(:);
        Ytempsel{r}=Y1sel(r)+Y2sel(:);
    end
    Xsel=vertcat(Xtempsel{:}); 
    Ysel=vertcat(Ytempsel{:});
    scatter(Xsel,Ysel, 'b', 'filled')
    hold on
    toc
end

Is there a way to do what I want or is simply impossible?

TEX
  • 2,249
  • 20
  • 43
  • How many rows are in your data matrix? – Suever Nov 08 '18 at 15:34
  • 1
    Thanks: `P=10^8` as I wrote in the question – TEX Nov 08 '18 at 15:36
  • 1
    Let's imagine you have a 4k (ultra-HD) monitor, you have up to 4096 x 2160 resolution, totalling ~8*10^6 pixels. **You will not be able to see 10^8 distinct points on a plot, let alone all of their sum combinations!!** Rethink the problem, reframe what you want to *achieve* by this plot, and there's likely a better way to do it! – Wolfie Nov 08 '18 at 15:58
  • @Wolfie Thanks. My objective is not to see each point, but to see the shape of the picture coming out from them. – TEX Nov 08 '18 at 16:04
  • 2
    @user that's my point, you can analyse the "shape" without an impossible plot, using things like the best fit, measures of spread, max/min/mean, trends, envelope, histograms, numerous other metrics and visualisations... Take a step back from trying to plot, and work out what you want to *show*, what you're interested in, and how you can demonstrate that without a plot where your point density will be (10^16/10^6)=10^10 points per pixel... – Wolfie Nov 08 '18 at 17:45

1 Answers1

2

You are trying to build a vector with P^2 elements, i.e. 10^16. This is many order of magnitude more that what would fit into the memory of a standard computer (10GB is 10^10 bytes or 1.2 billion double precision floats).

For smaller vectors (i.e. P<1e4), try:

Xsum=bsxfun(@plus,X1,X2.'); %Matrix with the sum of any two elements from X1 and X2
X=X(:);                     %Reshape to vector
Ysum=bsxfun(@plus,Y1,Y2.');
Y=Y(:);
plot(X,Y,'.') %Plot as small dots, likely to take forever if there are too many points

To build a figure with a more reasonable number of pairs picked randomly from these large vectors:

Npick=1e4;
sel1=randi(P,[Npick,1]);
sel2=randi(P,[Npick,1]);
Xsel=X1(sel1)+X2(sel2);
Ysel=Y1(sel1)+Y2(sel2);
plot(Xsel,Ysel,'.');     %Plot as small dots
Brice
  • 1,560
  • 5
  • 10