0

I'm developing a fleet scheduling application and look for an effective way to store distances between geographic locations.

The application-code accesses the matrix as a two-dimensional array double[,].

In order to make the matrix persistent, I currently serialize the matrix as a string. After serialization it looks like this:

"1 4 9 8 3 6 \n
 5 6 7 9 3 6 \n
 34 4 5 6 6 7 \n"

Then it is stored in a column of type varchar(max) in a SQL Server 2008 database. However, I wonder if this string could get too big.

Assuming that each entry has one digit and neglecting white-spaces and "\n"s, theoretically I could store the distances of around 46000 locations (square-root of 2 147 483 647 - the size of varchar(max)) in one entry. This would be sufficient in my context.

Does this approach have any severe disadvantages? Would it be better to store distances in an extra table, where each row contains one distance between two locations?

If 100 users of our application stored 1000 locations respectively, I would have 100000000 = 100 * 1000 * 1000 rows in such a table....

marc_s
  • 732,580
  • 175
  • 1,330
  • 1,459
curiosity
  • 201
  • 1
  • 4
  • 12
  • 1
    Why do you think that you need to store your data in such a non-relational way? (Relational Rule #1: no arrays in columns, only scalars). It makes it virtually impossible to query the contents of your matrix with any efficiency. – RBarryYoung Oct 09 '12 at 21:15
  • I don't need to query the matrix. If the user works with the application, all distances are loaded into the double[,]-array at once. I don't want to select single distances, just the whole matrix, this is indeed necessary. However, if some of the distances are updated, I have to rebuild the whole string. – curiosity Oct 09 '12 at 21:22
  • I don't understand why you would want to rebuild the entire encoded string every time a single value changed? That seems terribly inefficient, and completely avoidable (in the database anyway) by simply not encoding things that way. And that still begs the question: why do you think that you need to do it this way? – RBarryYoung Oct 09 '12 at 21:26
  • I just rebuild the string, if the user saves his changes. I thought it was even more inefficient to store all distances in a table. I really have to read out *all* distances each time the user logs into the application. However, I have some doubt whether this will be the right approach in the long term, that's why the question. – curiosity Oct 09 '12 at 21:32
  • 2
    If you trully need to treat the data like a black box on the server, then yes, that's faster. But trying to move the entirety of 2GB+ back and forth between a client and server every time someone logs-in or changes something is going be awfully slow no matter what. You'd do better with a design that allowed the client to work with and/or change less data at a time. – RBarryYoung Oct 09 '12 at 21:59
  • 1
    The max size for a column of type `VARCHAR(MAX)` is 2 GByte of storage - 2 **billion** characters. Leo Tolstoj's *War and Peace* is a 1'440 page book, containing about 600'000 words - so that might be 6 million characters - rounded up. So you could stick over 300 copies of the entire *War and Peace* book into each `VARCHAR(MAX)` column. Good enough? – marc_s Oct 10 '12 at 05:11
  • 1
    To put @RBarryYoung's comments another way - why are you storing this in a database at all, why not just store it as a flat file, if you're just going to treat it as a block of text? – Damien_The_Unbeliever Oct 10 '12 at 06:34
  • Thank you for the helpful comments. I wil think over this issue. – curiosity Oct 10 '12 at 10:52

1 Answers1

1

You could just compress the array into a blob field. That will be the most efficient. Instead of serializing to a string, compress to a byte array, and vice versa on read.

emperorz
  • 429
  • 3
  • 9