0

I am using Ruby on Rails and have a situation that I am wondering if is appropriate for using some sort of Key Value Store instead of MySQL. I have users that have_many lists and each list has_many words. Some lists have hundreds of words and I want users to be able to copy a list. This is a heavy MySQL task b/c it is going to have to create these hundreds of word objects at one time.

As an alternative, I am considering using some sort of key value store where the key would just be the word. A list of words could be stored in a text field in mysql. Each list could be a new key value db? It seems like it would be faster to copy a key value db this way rather than have to go through the database. It also seems like this might be faster in general. Thoughts?

Chris Hanson
  • 54,380
  • 8
  • 73
  • 102
TenJack
  • 1,594
  • 4
  • 21
  • 35

1 Answers1

1

The general way to solve this using a relational database would be to have a list table, a word table, and a table-words table relating the two. You are correct that there would be some overhead, but don't overestimate it; because table structure is defined, there is very little actual storage overhead for each record, and records can be inserted very quickly.

If you want very fast copies, you could allow lists to be copied-on-write. Meaning a single list could be referred to by multiple users, or multiple times by the same user. You only actually duplicate the list when the user tries to add, remove, or change an entry. Of course, this is premature optimization, start simple and only add complications like this if you find they are necessary.

You could use a key-value store as you suggest. I would avoid trying to build one on top of a MySQL text field in less you have a very good reason, it will make any sort of searching by key very slow, as it would require string searching. A key-value data store like CouchDB or Tokyo Cabinet could do this very well, but it would most likely take up more space (as each record has to have it's own structure defined and each word has to be recorded separately in each list). The only dimension of performance I would think would be better is if you need massively scalable reads and writes, but that's only relevant for the largest of systems.

I would use MySQL naively, and only make changes such as this if you need the performance and can prove that this method will actually be faster.

Zack Bloom
  • 8,309
  • 2
  • 20
  • 27
  • Thanks a lot Zach. This is very informative. I'm wondering, why is it best to user a list-words table? Why not just put a list_id in each word object and then has_many :words in List? – TenJack Nov 13 '10 at 07:08
  • A second question: I was assuming that using regex on a string would be faster than a db query. For example, I could use gsub to find and remove a word by replacing the word with en empty string. Is this not a valid assumption? – TenJack Nov 13 '10 at 07:10
  • @TenJack - To your first question: I was assuming that there are is a limited number of words, as in a tag system. Of course, as there are only so many words in a language, the number of words is limited. If there are less words than list entries and/or your interested in indexing or querying by word, then use a list-words table. If there are very few common words between lists, then a many-to-one would be fine. – Zack Bloom Nov 13 '10 at 08:34
  • @TenJack - To your second question: In terms of computational efficiency, regex will almost certainly not be faster than a properly indexed query. As you need to update the database either way (so it's overhead is irrelevant), it would almost certainly be faster to run a single update command rather than a select - regex - update (by an order of magnitude). But the simple answer is: Test It! – Zack Bloom Nov 13 '10 at 08:42