4

I have an existing, mature schema to which we need to add some new Product attributes. For example, we have Products.Flavor, and now need to add new attributes such as Weight, Fragrance, etc. Rather than continue to widen the Products table, I am considering a couple of other options. First is a new Attributes table, which will effectively be a property bag for arbitary attributes, and a ProductsAttributes table to store the mappings (and values) for a particular product's attributes. This is the Entity-Attribute-Value (EAV) pattern, as I've come to understand it. The other option is to add a new column to the Products table called Attributes, which is of type XML. Here, we can arbitrarily add attributes to any product instance without adding new tables.

What are the pros/cons to each approach? I'm using SQL Server 2008 and ASP.NET 4.0.

Chris Pitman
  • 12,990
  • 3
  • 41
  • 56
Mark Richman
  • 28,948
  • 25
  • 99
  • 159
  • (Bookarking this question now, so I can come back later and upvote all the "you really don't want to do that" answers) – Philip Kelley Mar 21 '11 at 14:25
  • Which of the two options do I really not want to do? ;-) – Mark Richman Mar 21 '11 at 14:26
  • What is the problem with adding new columns to the Product table? Are these attributes only relevant to a subset of products or something? – tster Mar 21 '11 at 14:48
  • 1
    The problem (if you want to call it that) is that the number of new attributes will almost certainly grow over time, and I want to avoid a table with 100 columns a year from now. – Mark Richman Mar 21 '11 at 14:52
  • Could dividing the products into classes and having a separate table for each class of products help to keep the number of columns per table reasonable? I think it is a sort of master-detail approach. – Andriy M Mar 21 '11 at 15:56

1 Answers1

3

This is (imho) one of the classic database design issues. Call it "attribute creep", perhaps, as once you start, there will always be another attribute or property to add. They key decision is, do you store the data within the database using the basic tools provided by the database (tables and columns) to structure and format the data, or do you store the data in some other fashion (XML and name/value pairs being the most common alternates). Simply put, if you store the data in a form other than that supported by the DBMS system, then you lose the power of the DBMS system to manage, maintain, and work with that data. This is not much of a problem if you only need to store it as "blob data" (dump it all in, pump it all out), but once you start have to seek, sort, or filter by this data, it can get very ugly very fast.

With that said, I do have strong opinions on name/value pairs and XML, but alas, none are positive. If you do have to store your data this way, and yes it can be an entirely valid business/design decision, then I would recommend looking long and hard on how the data you need to store in the database will be used and accessed in the future. Weight the pros and cons of each methodology in light of how it will be used, and pick the once that's easiest to manage and maintain. (Don't pick the one that's easiest to implement, you'll be supporting it for a lot longer than you'll be writing it.)

(It's long, but the "RLH" essay is a classic example of name/value pairs run amok.)

(Oh, and if you're using it, look into SQL Server 2008's "Sparse Columns" option. Doesn't sound like what you need, but you never know.)

Philip Kelley
  • 39,426
  • 11
  • 57
  • 92
  • Thanks for the feedback. We actually just had a little meeting on this subject, and decided to go with tables instead of XML for many of the reasons you outlined. Now its just a matter of making responsible use of the EAV pattern. – Mark Richman Mar 21 '11 at 18:39