1

We are developing a site at work for the xmas/new year period where individuals can buy/sell what they want its for charity basically, look at it as a smaller version of ebay.

We currently have 15 parent categories, and these parent categories have multiple sub categories which have many child categories.

We have designed the main item table which consists of Title, Description, CreatedBy, CreatedDate, Price etc

We are now moving on to the design of the item properties table for example let me explain about the vehicle category, the user can select a value for each of the following, Mileage, Make, Vehicle Type, No of Doors, Color, Fuel Type, Transmission.

We were planning on creating one main table which would consist of each property for each category so it would look something like this

PropertyId (BigInt) (PK)
ItemID (BigInt) (FK) < Links to the Main Item table
Mileage (int) (FK)
Make (int) (FK)
Type (int) (FK)
Doors (int) (FK)
Color (int) (FK)
Fuel (int) (FK)
Transmission (int) (FK)
Plus many more for each category

Each column will be of data type int and bigint, each FK will link to its respected table which will return the description that match the int value when queried, now I haven't gone any further I've taken a step back and thought this table is going to grow dramatically when we start working through all the other category properties, this table could potentially be well over 150 columns.

I will of course create the relationships between the tables which will help with performance but I'm unsure whether of not this is the right approach, when it comes to querying this table I will only be selecting the relevant fields i.e Vehicle category again I would only query the fields mentioned above.

But the thought of having a table with 150 + columns make my stomach turn (purely because I haven't worked with such size) maybe this is considered small to a DBA? I don't know but I thought I'll ask here and maybe get some reassurance and guidance in to what direction I should be going.

I did on the other hand find this

Table with a lot of columns

Which I have taken on bored what Rob Nicholson mentioned as well as Reed Copsey, but I wanted to get more of an input maybe from potential DBA's or maybe people that have crossed this sort of bridge and come up with a better approach.

Yours views and opinions will be highly appreciated.

Community
  • 1
  • 1
Code Ratchet
  • 5,758
  • 18
  • 77
  • 141
  • 1
    I guess it depends also if you want the categories to be flexible or not. Now need to update your datamodel for every new category. Otherwise you could go for a meta model like a label/value column. – Pleun Dec 17 '14 at 08:54
  • 1
    SQL Server provides sparse columns for such cases, thus avoiding key/value tables and their complexity. – Panagiotis Kanavos Dec 17 '14 at 08:58
  • nothing like a tight deadline... good luck! :-) – Tanner Dec 17 '14 at 08:58
  • @Pleun well in time the categories will grow so ideally what we implement now needs to be the right solution – Code Ratchet Dec 17 '14 at 09:04
  • @Tanner you have no idea lol, they say Xmas but they mean mid new year (according the dev manager) – Code Ratchet Dec 17 '14 at 09:06
  • 1
    The big issue is not that you might have a lot of columns. The big issue is that you seem to want to store the attributes of a vehicle and the attributes of a toaster in the same table. Another big issue is that you want surrogate keys for every column. It makes *no* sense to use a lookup table for mileage, number of doors, etc. – Mike Sherrill 'Cat Recall' Dec 17 '14 at 13:23

2 Answers2

1

Minor point: Why are "mileage" and "doors" keys to another table? Aren't these just integers? Are you going to have a "doors" table with doors_id and number, and with records (1,1), (2,2), (3,3), (4,4), etc? Why not just store the number and eliminate the table that maps number to themselves? If mileage is going to be the number of miles, this would require a table with tens of thousands of records, all mapping a number to itself. If the idea is that mileage is "low", "medium", or "high", than ok, different story.

First I would ask: If you are selling many different types of products, will you actually process all the detail facts about every possible product? Or will you just turn all this into text to be displayed in a description?

That is, if you are selling cars and clothes and books and cookies and toasters and a thousand other things, will the program ever process the number of doors? Will you ever produce reports broken out by number of doors? Will you provide a way for users to search for a car by number of doors?

If not, then it would be a lot simpler to just have a free form text field named "description" and let the people who populate the database type in whatever they want. Then you don't need to add fields to the database for every possible category of product, create separate data entry screens for every category of product, and a bunch of code to turn all these individual fields into formatted text. If that's the case, then throw away all the fields that are not processed, create one "description" field, and you're done.

If that's not the case, if you really will have a huge system with 100 product categories and separate search screens for each product category -- like if someone says they want to buy a car you take them to a screen where they can search by number of doors or transmission type; if someone says they want a book you take them to a screen where they search by author or genre, etc; if someone wants a toaster they go to a screen where they can search for number of slices and electronic versus electrical; if someone wants clothes there's a screen where they can search by size, material, color, etc; and so on for every different type of product you can think of ... Well, I'd say you are building a HUGE system. But okay.

There are basically three ways to do this:

  1. Every product record has a type code. Some fields are relevant to some types and not others. Irrelevant fields are left null. Yes, 150 fields in one table is a lot. Most tables I create have maybe a dozen or so fields. There's certainly no maximum number, but if you pass 30 I'd start to question it. It gets very difficult to manage. I've created tables where some fields are not relevant to some record types, but there were a handful and reasonably obvious. As all your fields are short identifiers, I doubt this would really hurt performance: 150 integer points, probably 4 bytes each, 600 bytes, some overhead, varies with the database engine, but still you're talking hundreds of bytes, not an excessive burden on most DB engines. The bigger problem is for the programmers to keep track of it all.

  2. Create a product record with the information common to all products, probably ID, type, description, price, maybe a few other things. Then create subtype tables with the detail for each type. So for every car there's a product record and a product-car record, for each toaster there's a product record and a product-toaster record, etc. The subtype records contain a pointer back to the product record, and when you display you have to find and read both.

  3. Create a product table and a product-attribute table. The product-attribute table contains the name of an attribute, like "color", and the value, like "blue". Then you can have any set of attributes for any product. This sucks and I strongly recommend not doing it. It is impossible to manage. How do you control the name of an attribute? If users just type in the attribute name and value at data entry time, there are going to be inconsistencies all over the place, like one person puts "brand" and another puts "manufacturer", one puts "doors" and another puts "number of doors", etc etc.

  4. Slightly better than 3, create an attribute table that lists the names of all attributes used anywhere with some sort of id. Create a producttype-attribute table that lists which attributes go with which product type. Create a product-attribute table that gives the values for the attributes for a given product. Like product (id, type, description, price), attribute(id, name), producttype-attribute(type,attribute_id), product-attribute(product_id, attribute_id). The producttype-attribute table is used in the code to determine which attributes are displayed on the screen for any given product type, at data entry time and also when displaying for the customers. This is way better than 3 because it keeps things consistent and controlled.

Jay
  • 26,876
  • 10
  • 61
  • 112
  • In regards to the integer values within the table they link to serperate table to return the description for the give ID for example I store Door = 2 this would return the description to the user as "2 Door", Mileage is the same we have a range which consists of 10k - 20k, 20k - 30k etc so when the user selects the Mileage ID 1 the following text would be returned 10K - 20K, – Code Ratchet Dec 17 '14 at 21:33
  • In regards to searching, each Parent category will have its own area within MVC and that one area will cater for the sub categories and child categories linked to that Parent. – Code Ratchet Dec 17 '14 at 21:42
  • RE doors: Well, I think it would be easier to store numbers like "doors" as an integer and display x + " doors", but whatever. – Jay Dec 18 '14 at 14:34
  • RE a separate set of screens for each category: How many categories do you expect to have? If the answer is "3", ok, cool. But you were saying that with a handful of fields for each category you expect to have 150 fields, so it sounds like you're expecting more like 20 or 30. If for each category you need a data entry screen, a search screen, and a product display screen, maybe others, you're talking about a system with at least 20 to 30 * 3 = 60 to 90 distinct screens! That is a very big and complex system. And you said you have a month to do this? ... – Jay Dec 18 '14 at 14:42
  • ... So you have to get at least 3 or 4 screens written, debugged and working per day? Even with a team of developers, the co-ordination to get that done sounds rather daunting to me. Good luck. – Jay Dec 18 '14 at 14:43
  • I'd be looking for a way to do this with one generic set of screens that handle all categories, and the exact data on each screen varies per category under program control. But if you really have to handle different categories differently, that could be difficult. – Jay Dec 18 '14 at 14:45
  • No certainly not developing 90 screens, we are thinking about 1 screen per Parent Category should of clarified that in my previous comment so that would be 15 area's that's one option of course. – Code Ratchet Dec 19 '14 at 01:13
  • But yesterday I was looking in to what you mentioned one particular area which would handle all the category screens so regardless of what your searching for you would be sent to this area and the view will rendered differently depending on what you searching. – Code Ratchet Dec 19 '14 at 01:19
1

I strongly recommend you do not set it up this way. INT is a fixed length field - this means you are going to be using up a lot database space with a lot of empty values. Instead create a different Item Properties table for each category you have. You'll still have the same number of fields but they'll be spread out in different tables. It'll be MUCH easier to work with and more efficient to query.

If you really feel like you need a single Item Properties table, I'd make it abstract and forget about the foreign keys. You'll need a different way to enforce referential integrity, but it'll perform better in the long run. For example, instead of this:

CREATE TABLE ItemProperties
    (
      id BIGINT IDENTITY(1, 1) ,
      itemId BIGINT ,
      mileage INT , --auto
      make INT , --auto
      mpg INT , --auto
      length INT , --boat
      motorhp INT , --boat
      motormake INT --boat
    )

Do this:

CREATE TABLE ItemProperties
    (
      id BIGINT IDENTITY(1, 1) ,
      itemId BIGINT ,
      property1 INT ,
      property2 INT ,
      property3 INT
    )

CREATE TABLE PropertyDefinition
    (
      id BIGINT IDENTITY(1, 1) ,
      itemId INT ,
      property1label VARCHAR(255) ,
      property2label VARCHAR(255) ,
      property3label VARCHAR(255)
    )

    INSERT INTO dbo.PropertyDefinition
            ( itemId ,
              property1label ,
              property2label ,
              property3label
            )
    VALUES  ( 1 , -- auto
              'mileage' , 
              'make' , 
              'restofautoproperties'  
            ),
            ( 2, --boat
            ,'length',
            'motorhp',
            'motormake'
            )
Dave.Gugg
  • 6,561
  • 3
  • 24
  • 43
  • I thought about doing this "Instead create a different Item Properties table for each category you have" That way I can micro manage different categories and extend them as and when is required, I'm currently at work ill have a look at the above snippet when I get home and come back to you – Code Ratchet Dec 17 '14 at 21:29
  • In your ItemProperties table you have property1,2 and 3 but I can't see where your using them inside the PropertyDefinition table? do they need to be there if I'm storing the text values in PropertyDefinition table? – Code Ratchet Dec 17 '14 at 22:00
  • They are your ItemProperties values (or keys). So for car X, property1 would be 35,000, property2 would be Saturn, property3 would be automatic. What these values are referring to would be stored in the PropertyDefinition table - mileage, make, and tran type. – Dave.Gugg Dec 17 '14 at 22:04
  • Just so I get this right, my ItemProperties table would consist of every property for my categories? and inside the PropertyDefinition is where I would save the input the user has entered? – Code Ratchet Dec 17 '14 at 22:53
  • Also what is the relationship between the ItemProperties Table and PropertyDefinition I can see in the ItemProperties table property1 INT but I can't see where this INT is referenced within the definition table or am I missing something!?!? – Code Ratchet Dec 17 '14 at 23:42
  • I am just trying to show a concept - you'll need to work out the actual details to fit your specific situation. – Dave.Gugg Dec 18 '14 at 14:01