I'm running MariaDB 10.2.29, and started exploring JSON data types. I'm trying to understand how to group by dynamic JSON keys, and sum up their nested values. For example, the table contains the following in the 'data' field entries:
MariaDB [vstats]> select data from top_domains where vtime >= '1573666140' and vtime <= '1573666200' limit 2\G
*************************** 1. row ***************************
data: {"domain1.com":{"total":"260","cached":"153"},"domain2.com":{"total":"1501","cached":"1170"},"domain3.com":{"total":"212","cached":"190"},"domain4.com":{"total":"1206","cached":"1118"},"domain5.com":{"total":"353","cached":"285"},"domain6.com":{"total":"216","cached":"41"},"domain7.com":{"total":"112","cached":"78"},"domain8.com":{"total":"220","cached":"194"},"domain9.com":{"total":"95","cached":"62"},"domain10.com":{"total":"99","cached":"18"}}
*************************** 2. row ***************************
data: {"domain1.com":{"total":"240","cached":"166"},"domain2.com":{"total":"1644","cached":"1279"},"domain3.com":{"total":"195","cached":"174"},"domain4.com":{"total":"1276","cached":"1185"},"domain12.com":{"total":"84","cached":"72"},"domain5.com":{"total":"373","cached":"258"},"domain6.com":{"total":"92","cached":"13"},"domain7.com":{"total":"148","cached":"102"},"domain11.com":{"total":"217","cached":"25"},"domain8.com":{"total":"380","cached":"345"}}
2 rows in set (0.00 sec)
MariaDB [vstats]>
The 'vtime' field contains epoch values from when the stats were recorded. What I'm trying to get in return is a JSON object, grouped by domain names/JSON keys, with sums for the "total" and "cached" values. I started down a path using a nested select against the json_keys(data,'$')
values, but it didn't seem optimal for transversing possibly thousands of entries.
The main reasons I have for wanting to use the JSON table type are:
- Input data comes from a Perl hash encoded to a JSON object
- Output data is parsed and plotted using jQuery
- Reduced table size (I tried using normal columns and primary keys on vtime and domains but it grows far too fast)
- Use a dataset format which doesn't have to be manipulated for input/output
Any pointers would be greatly appreciated.