0

Here is my situation,

I have an application in which I need to store information about the results of different tests made on blood samples. I am currently using ASP.Net core for the web application and SQL Server for the database. (Might switch to Postgres as I will surely host on Linux and SQL Server for Linux is not totally available yet)

All the tests have some information in common, who performed it, at what time, any other related information for tracking purposes. But then all of them also have specific variables that I need to save for reporting/further calculations.

As of now I have about 20 different types of tests we perform on the samples we receive. The question I have is what would be the best way to save that data?

The two options I see are the following:

  1. Have 20 different tables, all containing the general sample tracking info + specific test variables. This way, when I need to fetch the info, everything for a specific type of test is easily accessible. But then I need to query all these tables by join queries whenever I want to generate a report or modify sample results information (as all the test results/variables entry forms are in a single page). There if very few moments where I need to query only a specific type of test, most of the time, I need to retrieve them all at once, which means that I will always (mostly) query the 20+ tables every time I need to access sample data.

  2. Have one big table containing all the results for the different tests performed and serialize (JSON format) only the specific test variables. So I would have all tracking information available (queryable, searchable, etc....) but the variables and results of each test would be in a single serialized column.

It is important to know that the variables/results won't be queried directly, I don't need to filter by them or anything like that (yet at the very least).

Now I wonder what would give me the best performance in the long term between using the multiple tables with join queries vs using serialization/deserialization that needs to take place whenever I access the data.

Also, I am aware that by serializing the test results/variables, I am losing ability to query by the information they contain (except for SQL server 2016 that now includes a way to query JSON information if I'm not mistaken...). I also try to follow best practices by normalizing the database but I'm not a pro and I don't know what would be the best approach between my two options (or any other option if there is a better alternative, I'm totally open to better ideas)

So what would be the best approach and why?

Usage estimate There might be around 15 to 30 millions tests performed every year. Of which I would say 2/3 would be of 5 different blood tests and the other third would be all the other tests performed.

Os1r1s110
  • 47
  • 1
  • 3
  • 10
  • Performance is always a touchy topic. Did you make a poc with both approaches and measured it? It's the only way to be sure – Tseng May 07 '17 at 20:20
  • You are right there is not a lot of ways to evaluate performance except by testing and benchmarking it. The problem I have is I don't yet have the server on which the production app will run nor do I know the specs (RAM, etc... it will have so i'ts difficult to set up a representative test...). Also, I'm looking not only for pure perf. but also for best practices. – Os1r1s110 May 07 '17 at 21:02
  • Relational tables is more appropriate for this scenario. See https://stackoverflow.com/questions/31972056/mysql-embedded-json-vs-table/31977656#31977656 – Michael Freidgeim Oct 23 '17 at 12:58

1 Answers1

0

Different table for different test is a good idea to work with. Reason 1:If only 10 tests are performed on the sample rest of the column will unnecessary waste DB space. Reason 2:Creating report will be easy in future according to samples Reason 3:Filtering of data will be easy Reason 4:maintenance will be easy

If in case of tests are mandatory go with 1 table

  • For reason 1, I'm not really sure where there would be DB space wasting. If only 10 tests are required with the single table scenario, then I create 10 rows in this table and they each contain the general test information + their specific variables in a serialized column... Could you explicit where there would be space loss in this case? – Os1r1s110 May 07 '17 at 21:01
  • if you use only one table for entire tests then there will chances of heavy database and space wastage – Shankit Agrawal May 10 '17 at 18:55