2

In my Ruby On Rails app I have to store (relative) large matrices of about 300x300 elements (float values, mostly different) per document. On every retrieval of a document the full matrix has to be loaded for custom calculations. The matrices are also updated quite often (so that write performance and memory caching is an issue, too).

What's a good way to store such matrices (performance wise)? Some alternatives that come to my mind:

  1. A table with the columns row, column and value. But I guess fetching and storing a whole matrix (with about 90000 cells) is not a good idea to do on every request (some memory caching will help).
  2. Store the matrix serialized in a text field/column. Do you have any ideas how it compares to 1. from a performance standpoint?
  3. Use some document database (e.g. Mongo) and store the whole matrix inside one field of the document (not sure where the benefit in comparison to 2. is).
Community
  • 1
  • 1
medihack
  • 16,045
  • 21
  • 90
  • 134
  • how about hooking with R or Matlab et al for matrix work and let them do the heavy load? – fuzzyalej Feb 08 '13 at 16:02
  • answers to somewhat similar question here even though this isn't sparse data, so not all would be relevant: http://stackoverflow.com/a/2599384/178651 - the trick I think is to first think about the way to store/retrieve it effectively in/from the DB, then you can determine how much parsing, comparisions, etc. you need to do in Rails. – Gary S. Weaver Feb 08 '13 at 19:15
  • 1
    Given this is a unique scenario to your needs, I'd suggest you try a few options that seem best. My first thought was to just store it as a large blob (think CSV) (option #2). But, I've never needed to store anything like you're requirements, so it would be hard to predict. I wouldn't use MongoDB just for this. Your description doesn't sound like you'd need the other features of the DB. – WiredPrairie Feb 08 '13 at 19:26

1 Answers1

0

Write and retrieval performance on-demand from any kind of ACID compliant storage is going to be an issue. As a best practice, I would suggest a cache heavy strategy that keeps your matrix in memory, perhaps using memcache if you need it across multiple servers. You can then take read and write out of your request cycle, and subsequently you don't have to care about write performance much, and can use anything (like a text field in MySQL or whatever).

To do this, I'd suggest writing a custom matrix class that does the following:

  • Checks the cache for a copy of it's data, if not available, loads from the database and populates the cache.
  • Writes to the database asynchronously when your cache copy gets updated.
  • Provides optimized interfaces to the data for your code
Joshua
  • 5,336
  • 1
  • 28
  • 42