0

In the physics lab I work in, at the beginning of each experiment (of which we run several per minute) we perform a series of Inserts into a MariaDB database. One of the tables has a few hundred columns - each corresponding to a named variable - and serves as a log of the parameters used during that run. For example, one variable is the laser power used during a particular step of the experiment.

Over time, experimenters add new variables to parametrize new steps of the experiment. Originally my code handled this by simply adding new columns to the table - but as the number rows in the table increased beyond around 60000, the time it took to add a column became unusably long (over a minute).

For now, I have circumvented the problem by pre-defining a few hundred extra columns which can be renamed as new variables are needed. At the rate at which variables are added, however, this will only last our lab (or the other labs that use this software) a few years before table maintenance is required. I am wondering whether anyone can recommend a different architecture or different platform that would provide a natural solution to this "number of columns" problem.

wlunden
  • 79
  • 5
  • Now would you let a database designer postulate physics theories to a symposium? NO! Then why would you let a perfectly good physicist ( I assume ) design a database. Find someone who knows the task in hand and get some proper help. Database design is not a trivial task – RiggsFolly Jul 13 '16 at 16:55
  • _I am wondering whether anyone can recommend a different architecture_ You cannot design a database ___that works___ unless you know a lot more than this about how it is to be used, what it is to be used for, and how the information will be required out of it again. Honestly, get some professional help – RiggsFolly Jul 13 '16 at 16:58
  • While I appreciate the advice, I need to emphasize that the nature of our style of research (i.e. our budget and the fast pace of our research) is such that almost everything is DIY. We don't pretend to be database designers any more than we pretend to be electrical engineers, but regardless in this case we need an in-house solution that works on a small scale. – wlunden Jul 13 '16 at 18:05

2 Answers2

1

I am guessing you are running various different types of experiments and that is why you need an ever increasing number of variables? If that is the case, you may want to consider either:

  • having a separate table for each type of experiment,
  • separate tables to hold each experiment type's parameter values (that reference the experiment in the primary table),
  • have a simpler "experiment parameters" table that has 3 (or more, depending on complexity of values) references: the experiment, the parameter, and parameter value.

My preference would be to one of the first two options, the third one tends to make data a bit more complicated, to analyze AND maintain, than the flexibility is worth.

Uueerdo
  • 15,723
  • 1
  • 16
  • 21
  • I do like your third suggestion, and for some reason I had not thought of it, even though some of our other tables work this way. Thank you for your input. – wlunden Jul 13 '16 at 18:06
1

It would seem that EAV is best for your scenario. I would always steer away from it, but in this case it seems to make sense. I would keep the last n experiments of data in the main table(s), and dog off the other ones to an archive table. Naturally you would know of the speed increases in archiving away data not needed at the moment, yet always available with joins to larger tables.

For an introduction into EAV, see a web ddocument by Rick James (a stackoverflow User). Also, visit the questions available on the stack here.

Everytime I look at EAV I wonder why in the world would anyone use it to program against. But just imagining the academic/experimental/ad-hoc environment that you must work in, I can't help but think it might be the best one for you. The following is a high-level exploratory question entitled Should I use EAV model?.

Community
  • 1
  • 1
Drew
  • 24,851
  • 10
  • 43
  • 78