Looking for SQL Server table design sanity check for profile tables with dynamic columns

Question

DOES IT MAKE SENSE TO GENERALIZE A PROFILE TABLE FOR TWO DISTINCT PROFILES IN THE SYSTEM, IN MY CASE, TEACHERS AND STUDENTS? I am doing this and just want a general sanity check regarding my design approach. Responses appreciated. Background below:

We are building a web system that has both Teachers and Students. Both have accounts in the system. Both have profiles in the system.

My question is about the table design of those Profile tables.

The Teacher profile is pretty static regarding the metadata associated with it. Each teacher has a set number of fields that exposes information about that individual (schools, degrees, etc). The students, however, are a different case. We are using a windows service to pull varying data about the students from an endless stream of excel spreadsheets.

The data gets moved into our database and then the fields appear in association with the student's profile. Accordingly, each and every student may have very different fields in their profile.

I originally started with the concept of three tables:

Accounts

AccountID

TeacherProfiles

TeacherProfileID
AccountID
SecondarySchool
University
YearsTeaching
Etc...

StudentProfiles

StudentProfileID
AccountID
Header
Value

The StudentProfiles table would hold the name of the column headers from the Excel spreadsheets and the associated values.

I have since evolved the design a little to treat profiles more generically per the attached ERD image. The Teacher and Student "Headers" are stored in a table called ProfileAttributeTypes and responses (either from the excel document or via input fields on the web form) are put in a ProfileAttributes table. This way both Student and Teacher profiles can be associated with a dynamic flow of profile fields. The "Permissions" table tells us whether we are dealing with a Student or a Teacher.

Since this system is likely to grow quickly, I want to make sure the foundation is solid. Can you please provide feedback about this design and let me know if it seems sound or if you could see problems it might create and, if so, what might be a better approach?

Thanks in advance.

profiles

Anything I can do to help clarify this or make it more "answerable"? Thanks again. — Peter, Apr 09 '12 at 16:54
Some questions: From your model it looks like a user can have multiple profiles? How diverse are "Students" and "Teachers" treated by your application? How will you be using this data? — Filip De Vos, Apr 09 '12 at 20:25
Hi Filip. A user can have only a Teacher or Student profile, not both. Every teacher has N fields per profile, where N is a fixed number. Student profiles are populated daily by new data that may or may not update the fields associated with unique accounts. And all fields are variable per student. The data will be used to search student profiles (get all Caucasian students from VT with a B+ average and a household income of 95k) and produce reports (i.e: trend analysis. For FY 2011, how did Asian students perform in public schools drawing less than $$ in government funding). — Peter, Apr 10 '12 at 19:12

score 1 · Answer 1 · answered Apr 09 '12 at 20:23

Property bag approach

The data model you propose relies on a "property-bag" (A collection of key-value items for a profile). The elegance of this model is that you can extend your properties without needing to do data model changes.

The disadvantage is that you will very often have to "pivot" the data and that your tables (and indexes) will blow up in size very quickly. (my experience: 200 properties per key for 50K records = 10 million properties without anything around tracking changes on the properties.)

This model could be recommended if you mainly have to query one specific property for a key. Think queries like "How many people have a math degree?" where math degree is a property key.

Xml field approach

With this strategy we add an "xml" field to the Profiles table which takes the list of properties in the form of an xml. This model also allows you to extend the number of properties without having to do data model changes.

Sql Server has very good support for such fields (through xpath queries, xml indexes etc) and the benefit is of course that you keep a simple data model which allows you to store whatever you like in the xml field.

This model is recommended when the field content gets replaced as a whole, you can alter the data in an xml field through xpath queries, but it is quite slow.

Sparse Columns

The Sparse Column system has been introduced in SQL Server 2008 to allow you to create a lot of different fields in a table which as not densely populated. The benefit is that it allows you to create more columns than the 1024 limit and the non-populated fields will not take up space when they are not populated.

Disadvantage is that you need to know all possible fields up front or you will be looking a data model changes every time a new field is encountered. This model is great if you have mostly empty columns in your table.

Which approach to take?

This is the hard part, it all depends on what you want to do with the data. In my experience, the property bag approach works well with small data sets and if you don't need to keep track of changes to the properties. (I have seen situations with more than 1 billion records in the table after 1 month)

The Xml field can be a royal pain when you often have to query on specific content of the field, but works great to store information which will only be requested "per key"

Sparse columns works good when the columns are populated for less than 30%-40% of the records.

Additional Note: It is considered a bad practice to store things like "years teaching" in your data model as you will have to update the value all the time. it is better to store "Teaching Start Year" and calculate the delta.

score 1 · Answer 2 · answered Apr 15 '12 at 00:58

I don't think the design you have is very good. The model mixes the notion of a user and an entity.

Here is the start of a more appropriate design.

t_User

t_User_Settings (Profile)

t_Permissions

t_Actions

t_Student

t_Teacher

t_Student_Attributes

t_Teacher_attributes

User related items/attributes belong in t_User or t_User_Settings Domain related items/attributes belong in t_Teacher/t_Teacher_Attributes or t_Student/t_Student_Attributes

You can relate the domain concept (Teacher/Student) to the user concept via a foreign key. Or you can create a t_Teacher_User + t_Student_User tables.

Notice how you can tell exactly what goes where just by reading the table names.

btw - this schema is analogous to that used in many very robust and extensible enterprise systems. — mson, Apr 16 '12 at 20:35

score 0 · Answer 3 · answered Apr 17 '12 at 10:35

In my experience, the best way to sanity check a data model is to work out the queries / DML you're likely to need.

As Filip de Vos writes, your "property bag" approach doesn't easily lend itself to typical relational queries - "select count(*) from students where course = 'maths' and score > 12" would be a huge pain.

On the other hand, your initial design does solve issues around storing data whose schema varies or is unknown at design time.

In practice, you usually end up modelling the "fixed" stuff in a typical relational model, and using either a property bag or XML document to model the variable bits. If you can be clear about the schema at design time, "sparse columns" may also help.

Looking for SQL Server table design sanity check for profile tables with dynamic columns

3 Answers3

Property bag approach

Xml field approach

Sparse Columns

Which approach to take?