6

I'm adding a new feature to my user module for my CMS and I've hit a road block... Or I guess, a fork in the road, and I wanted to get some opinions from stackoverflow before I commit to anything.
Basically I want to allow admins to add new, 'extra' user fields that users can fill out on registration, edit in their profile, and/or be controlled by other modules. An example of this would be a birthday field, a lengthy description of themselves, or maybe points the user has earned on the site. Needless to say, the data stored will be varied and can range from large amounts of text, to a small integer value. To make matters worse - I want there to be the option to search this data.

With that out of the way - what would be the best way to do this? Right now I'm leaning towards having a table with the following columns.

userid, refFieldID, varchar, tinyint, smallint, int, text, date, datetime, etc.

I would prefer this as it would make searching significantly faster, and the reference table (Which holds all of the field's data, such as the name of the field, whether it's searchable or not, etc.) can reference which column should be used when storing data for that field.

The other idea, which was suggested to me and I've seen used in other solutions (vBulletin being one, although I have seen others whose names escape me at the moment), where you just have the userid, reference id, and a medtext field. I don't know enough about MySQL to say this with any certainty, but this method seems like it would be slower to search, and possibly have a larger overhead.

So which method would be 'best'? Is there another method I'm missing? Whichever method I end up using, it needs to be fast to search, not massive (A tiny bit of overhead is fine), and preferably allow complex queries used against the data.

Serge
  • 1,066
  • 2
  • 7
  • 24
Jon
  • 305
  • 3
  • 20
  • 45

2 Answers2

3

I agree that a key-value table is probably the best solution. My first inclination would be to just store a text column, like vBulletin did. But, if you wanted to add the ability for the data store to be a bit more extensible and searchable like you've laid out, I might suggest:

  • 1 medium/longtext or medium/longblob field for arbitrary text/binary storage (whatever is stored + overhead of 3-4 bytes for string length). Only reason to choose medium over long is to limit what can be stored to 2^24 bytes (16.7 MB) versus 2^32 bytes (2 GB).
  • 1 integer (4 bytes) or bigint (8 bytes)
  • 1 datetime (8 bytes)
  • Perhaps 1 float or double (4-8 bytes) for floating point storage

These fields will allow you to store nearly any type of data in the table but without inflating the width of table** (like a varchar would) and avoid any redundant storage (like having tinyint and mediumint etc). The text stored in the longtext field can still be reasonably searched using a fulltext index or a regular limited length index (e.g. index longtext_storage(8)).

** all blob values, such as longtext, are stored independently from the main table.

wuputah
  • 11,285
  • 1
  • 43
  • 60
  • Wow thanks, I was actually going to reply to the first person agreeing with #1, which columns to choose - but I guess I don't have to anymore :). Regarding your post - do you mean text AND blob, int AND bigint? Or one or the other? Also how do you feel about adding a 'bool' (tinyint(1)) column? I could see that being very useful and possibly used a lot - would it be worth the 3 bytes saved, in your opinion? Also, does the number of columns increase the size of a row on disk? Empty columns, of course. I'm not doubting your (Amazing) table layout, just curious. – Jon Feb 07 '11 at 04:05
  • 1 each for each item in my list, so 3 or 4 total columns, depending on whether you want float support. As for tinyint(1) - store those in the integer column. You're wasting a byte by adding tinyint(1), not saving 3. Every row in your table always has the same width in MySQL - works the same in most other RDBMSs. (How varchars affect this gets a bit complicated.) "Width" is also called "row size." – wuputah Feb 07 '11 at 04:30
0

One technique that might work for you is to store this arbitrary data as text, in some notation like JSON, XML, or YAML. This decision depends on how you'll need to access the data: if you only look up each user's full chunk of user data, it could be ideal. If you need to run SQL queries on specific fields in the user data, you'll need to use a pure SQL or a hybrid approach.

Many of the newer, highly scalable "NoSQL" systems seem to favor JSON data (eg, MongoDB, CouchDB, and Project Voldemort). It's nice and terse, and you can create arbitrarily complex structures including maps (JSON objects) and lists (JSON arrays).

Jim Ferrans
  • 30,582
  • 12
  • 56
  • 83