1

I have wondered about my current database design I made for my uni organization website. In general this website is for displaying informations and events. The event is different for each departement. It does some normal things like displaying event, creating event for admin, event registration for user, etc. The problem is every departement may have different form for registering new participant, say departement A has 5 form input, but departement B has 8 form input. And yet, every event may have different total of form with multiple type like text, radio or checkbox.

To solve the problem I mentioned before, I designed the database that has events, departements, event_forms, form_types, event_form_options (for checkbox or radio when admin decided to choose that type of form), event_form_responses, and some other tables that is out of the context of this question. The thought process I had, that admin can create whatever amount of form input they want, and each form input equals one row of event_forms table that refers to certain event with event_id. But the "problem" that might occurs with this approach lies in event_form_responses table. In that table, user will have response for each form input of certain event. Say, event A has 10 form inputs, and then 60 people decided to register to that event, that means event_form_response will have 600 rows of response for that event alone!. And then it needs to be displayed in admin dashboard.

My question:

Will that impact query and website performance? I think it will. What if I change the event_form and event_form_responses to store it as a JSON instead? Is it more advantageous at this point?. So that every event_form can have different signature with different amount of form inputs easily. And for the event_form_responses will have 1 * U instead of Q * U, which U is total of user registered in said event, and Q is amount of form inputs. Thank you in advance

Bill Karwin
  • 538,548
  • 86
  • 673
  • 828
newtocoding
  • 97
  • 2
  • 5
  • my basic question is, will the data be changed and you must ask yourself i am fit in json to handle update queries and complex json queries so on. – nbk Oct 01 '22 at 19:02
  • You could consider MongoDB if you want to go down the route of JSON – apokryfos Oct 01 '22 at 21:46
  • @nbk for the form inputs, it can be changed. As for the `event_form_responses` is a static one. Which is better, json for both table `event_form`, or take the middle part, using each row for each form input, and json for `event_form_response`, what do you think? – newtocoding Oct 02 '22 at 01:34
  • @newtocoding i try to avoid json and i can manipulate them at will, but it is always a hassle, so try to save data as normalized as you can, siz is uslally not relevant, only if you can use the tools to manipulate – nbk Oct 02 '22 at 01:38
  • @nbk what if I told you that the json will never queried like searched with `where` clause or something. Sure, it can be changed, but then again, it can be retrieved with it's `id` – newtocoding Oct 02 '22 at 01:48

2 Answers2

1

I wouldn't worry about 600 rows per event. Assuming you have designed indexes well to support your queries, MySQL can handle hundreds of millions of rows per table. Tables typically start to get hard to scale when you have over 1 billion (1e9) rows.

I've answered a lot of questions about MySQL and JSON on Stack Overflow. My conclusion is that while JSON makes it easy to store data with variable or complex structure, it comes at a cost.

Queries against JSON data are more complex than traditional SQL queries against normal rows and columns. It's harder to learn to search or sort data stored in JSON. Not impossible — but it's a totally different type of query. If you don't have experience with this yet, you will experience a steep learning curve.

You said you're concerned about performance, and I have found that it's harder to optimize queries that search or sort JSON than queries that work with normal tables.

Also it depends a lot on how you structure your JSON. JSON is very free-form, you can make arrays and key/value objects and you can nest further structure. But in some cases, I've seen people create JSON structure that cannot be queried using MySQL's JSON functions. You need to do a lot of study of the types of JSON functions available, and do a lot of hands-on experimentation to understand their strengths and weaknesses.

Also I have found that it typically takes 2-3 times as much space to store data in JSON format compared to storing equivalent data in normal rows and columns. The reason is that numbers are stored as strings, object keys appear on every row instead of just in the table header, and there are extra characters needed for quotes, commas, and brackets.

You might like my presentation How to Use JSON in MySQL Wrong.

Bill Karwin
  • 538,548
  • 86
  • 673
  • 828
  • I think the json will not be queried, like searched with `where` clause or something, it just needs to be displayed. At most, the `event_form` can be changed, but again, the data can be queried based on the `id`. And as for `event_form_responses` can't be changed. Or maybe, take the middle part? Like, using each row for each form input, and json for `event_form_response`, what do you think? – newtocoding Oct 02 '22 at 01:41
  • In my presentation, I say if you must use JSON, follow this rough guideline: reference JSON columns in your select-list but not in the WHERE clause (or any other clause). If your query does not use fields inside the JSON document to search or sort, you may be in better shape. Still, there are cases where you want to extract a certain field from inside the JSON. While technically that is done in expressions of the select-list, there's a risk that you've chosen a JSON document structure from which it's hard to extract values. I encourage you to try it and experiment. – Bill Karwin Oct 02 '22 at 02:38
  • Your presentation is really informative, I like it. But can you help me clear this up. Maybe we can narrow this problem Maybe I can just store each form input as one row, because I think based on ur presentation it is faster sending it to website, and it will retrieved multiple times because it'll be displayed in user page. As for the `event_form_response`, it'll be only displayed in admin page. The json is just for the response of the user, with will be have title of the question and the response. Which is better, just go with 600 before, or reduce the total row with json? (1/2) – newtocoding Oct 02 '22 at 02:39
  • Because my concern with 600 or more is in the website. Do you think looping 600 times or more for displaying the data is okay? That's why I asked the question what if I reduce the row with json format. Though, it is just for admin But on the other hand, I think it's the same isn't it? I still need to loop every response in json data. Maybe I stick with my current design? I'm not really sure (2/2) – newtocoding Oct 02 '22 at 02:44
  • I encourage you to try it and experiment. You'll learn more by doing some hands-on work than by listening to me. – Bill Karwin Oct 02 '22 at 02:50
0

After reading @Bill Karwin slide presentation, experimenting on my own and a little bit of experience from part time job, I conclude that I'm going to use json for both table with the following reason:

  • For the even_forms table, I realized that user will need update the data. So making the normalized version would be hard/complex because we don't know where the column user might change, or maybe at the same time, user want to add or delete the column at the same time with random order. So json is more managable and easier.
  • For the event_form_responses, my senior said that we need to limit our data fetching from database to keep website's performance. Usually it would be around 50 or so. If I was going to make it normalized, fetching 10 user form responses is equal to fetching 100 data (if the event form has 10 questions). So displaying 5 event form response at a time is not really good for user experience, is it?. But even though, the user for this website is not going to reach like 10K at a time or something (I think 300 - 1000 at most), I'm just going to consider that as a best practice
newtocoding
  • 97
  • 2
  • 5