We are performing measurements of physical quantities and use a database (MSSQLSERVER2014) to analyze the data. The data is read from files (binary/text) to our database.
We currently have the following datastructure for the table of the data:
- DataID (PK, Bigint)
- FileID (FK, int)
- ParameterID (FK, int)
- UnitID (FK, smallint)
- Time (datetime)
- Value (real)
This table gets big (> 1.000.000.000 rows) but most SELECT statements get executed within an acceptable time.
When the analysis of a file or a group of files is done we delete the data to keep the database from clogging up. Since we keep the original files we can always get the data back. Deleting these rows takes a long time (> 1 day is not unusual) using 'DELETE FROM [Data] WHERE [FileID] IN (...)'. I haven't tried the performance of copying the data we want to keep to a temp table, truncating the [Data]-table and restoring the data from the temp table because it feels wrong.
To increase the performance of both DELETE and maybe also the SELECT statements, I was thinking about making a table for each file instead of dumping all the data into a single large table. This new set of tables could have a design like:
- DataID (PK, Bigint)
- ParameterID (FK, int)
- UnitID (FK, smallint)
- Time (datetime)
- Value (real)
The TableName could be stored in our table which keeps track of all the files. To delete the data of a file we can then DROP the table.
Is this a good practice for this situation or are there other (better) techniques ?
Since we also use EntityFramework to map the database to objects these 'dynamic' tables would complicate things but it's not impossible to make it work.
Other options are maybe to drop the SQL altogether and use an (noSQL) alternative ?