1

\ am dealing with a matrix in MATLAB which is sparse and has many rows and columns. In this case, the row and columns of the matrix are the ids for particular items. Let's assume them as id1 and id2.

It would be nice if the ids for rows and columns could be embedded so I can have access to them easily to them without the need for creating extra variables that keep the two ids.

The answer would be probably to use a table data type. Tables are very ideal answer for my need however I was wondering if I could create a table data type for a sparse matrix?

A  [m*n] sparse matrix    %% m & n are huge 
id1 [1*m] , id2 [1*n]     %% two vectors containing numeric ids for rows and column

Could we obtain?

T  [m*n] sparse table matrix

Thanks for sharing your view with me.

locke14
  • 1,335
  • 3
  • 15
  • 36
Yas
  • 811
  • 4
  • 11
  • 20
  • Check the documentation for the MATLAB `sparse` function: http://www.mathworks.com/help/matlab/ref/sparse.html?refresh=true – DMR Feb 02 '16 at 18:57
  • 1
    There is no `sparse table` class in Matlab. First reason is that `table()` can have variables with more than one column, how would you define sparsity in that case (rethoric question)? What is the table specific functionality that you want to retain as opposed to a sparse() matrix? – Oleg Feb 02 '16 at 19:30
  • Thanks, I did that before posting. `table(sparse(rand(10,10)))` as an example, makes the table non sparse which is not something I am looking for. – Yas Feb 02 '16 at 19:31
  • Oleg: Th requirements are as below as mentioned in my question: 1. `A` is a huge matrix with many zeros entries. It is preferred to be in sparse form. 2. Rows and columns of `A` have unique ids, it would be nice if rows and columns of A have identifiers like how table data type does it in Matlab. – Yas Feb 02 '16 at 19:34
  • Oleg: FYI this is the result when I type: `full(A)` Requested 246829x33336 (61.3GB) array exceeds maximum array size preference. Therefore, I am BOUNDED to have sparse A! – Yas Feb 02 '16 at 19:36
  • 1
    It seems to me, correct me if I am wrong, that what you want is to have a nice display in the VariablesEditor. Consider however, that tables store variables in a very different way, and the added value of having an excel like display, with row and column labels quickly loses its benefits, especially with your dimensions. Morevoer, tables do not allow to reshape their dimension (try a transpose), or to query data with linear indices. – Oleg Feb 02 '16 at 19:38
  • Maybe you should elaborate with an example, on a smaller table, what you mean by: "It would be nice if the ids for rows and columns could be embedded so I can have access to them easily" – Oleg Feb 02 '16 at 19:41
  • Oleg: You are very right, that is just the idea an excel like display. Ok, do you have an other suggestion or I should simply store 3 variables (A,ida,id2). Think about it, the less variables the cleaner your code. Shouldn't we take this serious as well in programming? – Yas Feb 02 '16 at 19:43
  • Oleg: the idea is as you mentioned an excel like representation, for a sparse matrix `A`, I would like the rows and columns have particular names. How could we do this ? – Yas Feb 02 '16 at 19:47
  • I don't understand your idea with the row and column names, do you really have 33336 labels? – Daniel Feb 03 '16 at 02:37

1 Answers1

2

I will address the question and the comments in order to clear some confusion.

The short answer

There is no sparse table class in Matlab. Cannot do. Use sparse() matrices.

The long answer

There is a reason why sparse tables make little sense:

  1. Philosophically speaking, the advantage of having nice row and column labels, is completely lost if you are working with a big panel of data and/or if the data is sparse.

    Scrolling through 246829 rows and 33336 columns? Can only be useful at very isolated times if you are debugging your code and a specific outlier is causing you results to go off. Also, all you might see is just a sea of zeros.

  2. Technically a table can have more columns for the same variable, i.e. table(rand(10,2), rand(10,1)) is a valid table. How would you consider define sparsity on such table?

    Fine, suppose you are working with a matrix-like table, i.e. one element per table cell and same numeric class. Still, none of the algebraic operators are defined on a table(). So you need to extract the content first, in order to be able to perform any operation that spans more than a single column of data. Just to be clear, once the data is extracted, then you have e.g. your double (full) matrix or in an ideal case a double sparse matrix.

Now, a few misconceptions to clear:

  • Less variables implies clearer/cleaner code. Not true. You are probably thinking about the extreme case (in bad practices) of how do I make a series of variables a1, a2, a3, etc..

    There is a sweet spot between verbosity and number of variables, amount of comments, and code clarity/maintainability. Only with time and experience you find the right balance.

  • Control over data cannot go without visual inspection. This approach does NOT scale with big data and the sooner you abandon it, the faster your code will become more reliable. You need to verify your results systematically, rather than relying on visual inspection. Failure to (visually) spot a problem in the data, grows exponentially with its dimension, faster than with systematic tests.

Some background info on my work:

I work with high-frequency prices, that's terabytes of data. I also extended the table() class with additional methods and fixes to help me with my work (see https://github.com/okomarov/tableutils), but I do not see how sparsity is a useful feature to add to table().

Community
  • 1
  • 1
Oleg
  • 10,406
  • 3
  • 29
  • 57
  • Dear Oleg, I think you are response is convincing and sounds logical. I appreciate your time for explaining your idea and sharing your view with me. I think you have mentioned quite interesting observations in your last comments. Perhaps, my question was motivated because I did not work with huge amount of data pretty often. Many thanks man I will check out your code on github asap. I will mark your response as the answer. – Yas Feb 03 '16 at 10:21