0

I currently have a Users table and now want to add other user-related information for a particular user. The form that accepts this information has fields like languages and OS each with a list of options with checkboxes.

For Example:

Languages known: checkbox PHP, Java, Ruby

OS knowledge: Windows, Linux, Mac

Currently my database tables looks like this:

USER
----------------------------------------
|  ID  |  Name      |  
-----------------------
|   1  |    John    |    
-----------------------
|   2  |    Alice    |    
-----------------------

LANGUAGES
----------------------------------------
|  ID  | User_ID(FK)     | lang-name  | 
----------------------------------------
|   1  |    1            |   PHP      |
----------------------------------------
|   1  |    2            |   Java     |
----------------------------------------

OS
----------------------------------------
|  ID  | User_ID(FK)     | os-name  | 
----------------------------------------
|   1  |    1            | Windows  |
----------------------------------------
|   1  |    2            |  Windows |
----------------------------------------

Does this seem like a good schema? There are many more such user-related fields that will each need to have their own table and there seems to be a lot of redundancy within a table since thousands of users will know PHP and hence there will be thousands of rows with PHP as the language for each of the different users.

Is there a better way to organize the schema?

Mark
  • 883
  • 1
  • 8
  • 14
  • Which DBMS are you using? PostgreSQL? Oracle? DB2? Firebird? (For PostgreSQL there would be a very easy and performant solution) –  May 22 '12 at 22:31
  • I am using Sqlite right now but will be changing to PostgreSQL in the future – Mark May 22 '12 at 23:55

1 Answers1

1

Perhaps you could make Language and OS first-class entities in the database with their own tables, then use a joining table for the many-to-many relationship with User. Something like this:

User
---------
ID
Name
etc...

Language
---------
ID
Name

OS
---------
ID
Name

UserLanguage
---------
UserID
LanguageID

UserOS
---------
UserID
OSID

That way the actual entities (User, Language, OS) are self-contained with only the data that's meaningful to them, not polluted or duplicated with the concerns of their relationships with each other. And the relationships are contained within their own simple numeric-only tables, which themselves aren't entities but are just many-to-many links between entities.

No data is duplicated (in your sample data, Language and OS would each have only three records, at least for now), and it would be a lot friendlier to ORMs and other frameworks if you ever need to use one.

Edit: Based on your comment, you might try something like this:

User
---------
ID
Name
etc...

Lookup
---------
ID
LookupTypeID
Value

LookupType
---------
ID
Value

UserLookup
---------
UserID
LookupID

This gives you a lot of flexibility. In your sample data, Language and OS would be records in LookupType. All of the languages and OSes would be values in Lookup which link back to their respective LookupType. So still no repeating of data. And the UserLookup table is the only many-to-many link table.

Be careful with this design, though. It is flexible, definitely. But when you use this table structure as your actual domain models you run into situations where terms like "Lookup" become business terms, and that's probably not the case. "Language" and "OS" are the actual models. I would recommend using Views or perhaps Stored Procedures to abstract this structure from the code. So the code would pull Languages from the Language view or procedure, not directly from the Lookup table.

David
  • 208,112
  • 36
  • 198
  • 279
  • Thanks for the answer. On further thought, I decided to instead store it all as an xml file because of the sheer number of database tables I would have to create otherwise. – Mark May 23 '12 at 18:52
  • @Tulasi: I assume by "sheer number" you mean that there will be a lot of lookup tables beyond just `Language` and `OS`? If you're talking about dozens of tables, it's actually not as bad as you might think as long as the number of entities is fairly static. If you expect them to change, then a more creative approach for lookup data is possible. You can have a single `Lookup` table and a `LookupType` table. Adding/removing lookup types becomes easy in that schema. You can still provide Views or Stored Procedures for strongly-typed entities for lookup types. – David May 23 '12 at 18:56
  • Yes. I would require to create 20-30 tables with 4-5 records in them. Using the Lookup table and the LookupType table I no longer need each of them to have join tables but it still seems like a lot of tables and a lot of querying for when I want to display skillset information for a single user. I am still somewhat confused about which options is better -storing as an xml or using database tables. Any reason why choosing xml is bad? – Mark May 23 '12 at 19:08
  • @Tulasi: Well, XML isn't nearly as optimized as a relational database, that's for sure. So performance is one reason. Reporting is another, how would you report on the contents of XML blobs? How would you enforce data integrity? I don't see how 3 tables for all of the lookups in this case would be "a lot of tables." The querying would be a pretty standard pattern, and Views can help with that. You'd just need to select from `Lookup` where `LookupType` is of a known ID, or join the `LookupType` table in the select where `LookupType.Value` is a known string. – David May 23 '12 at 19:27