1

I have two tables: Files and FileLogs and I'm trying to get rid of the FileLogs table and store some data from it inside the Files table.

Files

id lastStatusId meta
187 101 NULL
188 101 { "foo": "bar" }

FileLogs

id fileId statusId createdAt meta
1 187 101 2019-06-18 19:43:19 NULL
2 187 102 2019-06-18 19:43:22 NULL
3 187 202 2019-06-18 19:45:32 { "foo2": "bar2" }

I already added the lastStatusId column to the Files table (defaults to 101).

To get the latest statusId for every file from the logs into the latestStatusId I run this query and it works great:

UPDATE
  `Files` f1,
  (
    SELECT
      f.id as fileId,
      fl.statusId,
      fl.meta
    FROM
      (
        SELECT
          MAX(id) as id,
          fileId
        FROM
          `FileLogs`
        GROUP BY
          fileId
      ) AS latestIds
      LEFT JOIN `FileLogs` AS fl ON latestIds.id = fl.id
      LEFT JOIN `Files` AS f ON f.id = fl.fileId
    ORDER BY
      fl.id ASC
  ) temp1
SET
  f1.lastStatusId = temp1.statusId
WHERE
  f1.id = temp1.fileId;

I thought I could use a similar query to go over all FileLogs and write the status into the Files' meta field like so:

UPDATE
  `Files` f1,
  (
    SELECT
      f.id as fileId,
      fl.statusId,
      fl.meta,
      fl.createdAt
    FROM
      `FileLogs` AS fl
      LEFT JOIN `Files` AS f ON f.id = fl.fileId
  ) temp1
SET
  f1.meta = JSON_MERGE_PATCH(
    COALESCE(f1.meta, '{}'),
    JSON_OBJECT(
      'statusInfo',
      JSON_OBJECT(
        CONCAT('', temp1.`statusId`),
        JSON_OBJECT(
          'createdAt',
          temp1.createdAt
        )
      )
    )
  )
WHERE
  f1.id = temp1.fileId;

Result of the subquery is something like this:

fileId  statusId    meta            createdAt
187     101         NULL            2019-06-18 19:43:19
187     102         NULL            2019-06-18 19:43:22
187     202         {"foo": "bar"}  2019-06-18 19:43:26

But the SET is only run for the first f1.id = temp1.fileId and not the second or third (which kind of makes sense) - and the result looks like this:

File id:187

meta

{
    "statusInfo": {
        "101": {
            "createdAt": "2019-06-18 19:43:19.000000"
        }
    }
}

  1. How can I achieve that the two other status (102, 202) are also added to the meta column?
  2. How can I also get the FileLogs' meta data in there as well (if not null)?
Bill Karwin
  • 538,548
  • 86
  • 673
  • 828
Philipp Kyeck
  • 18,402
  • 15
  • 86
  • 123
  • 1
    Please learn to use proper `JOIN` syntax rather than cross products with joining conditions in `WHERE`. It makes the code easier to understand. – Barmar Mar 28 '23 at 15:57
  • I don't see the need for the join in the subquery. It doesn't return anything from `Files` (`f1.id` can be replaced with `fl.fileId`) – Barmar Mar 28 '23 at 16:00
  • Assuming you're trying to get rid of the `FileLogs` table to save storage space, are you aware that storing the same data in JSON format will take _more_ storage space? – Bill Karwin Mar 28 '23 at 16:02
  • I'm not sure why you're only getting the last value. `JSON_MERGE_PATCH()` is supposed to merge recursively, so you should get both `statusId` values. – Barmar Mar 28 '23 at 16:08
  • @Barmar I tried to clean up the query as much as I could but missed this. The original has some more fields in it. And yes, `JSON_MERGE_PATCH()` is working as expected but I think it is only called once for every id – Philipp Kyeck Mar 28 '23 at 17:51

1 Answers1

1

You can pre-aggregate the JSON in the subquery. Here's a demo:

select f.id,
  json_objectagg(l.statusId,
    json_object(
      'createdAt', l.createdAt, 
      'meta', l.meta
  )) as statusInfo
from Files as f
join FileLogs as l on f.id = l.fileId
group by f.id;

Result given your example data:

        id: 187
statusInfo: {
  "101": {
    "meta": null,
    "createdAt": "2019-06-18 19:43:19.000000"
  },
  "102": {
    "meta": null,
    "createdAt": "2019-06-18 19:43:22.000000"
  },
  "202": {
    "meta": {
      "foo2": "bar2"
    },
    "createdAt": "2019-06-18 19:45:32.000000"
  }
}

Now that this resembles the JSON structure you want, we can wrap this into an UPDATE:

update Files
join (
  select fileId,
    json_objectagg(statusId,
      json_object(
        'createdAt', createdAt,
        'meta', meta
      )
    ) as statusInfo
  from FileLogs
  group by fileId
) as f on Files.id = f.fileId
set meta = json_object('statusInfo', f.statusInfo);

However, you should think about the implications of doing this.

  • Storing data in JSON takes more space than storing the same data in rows of FileLogs.

  • Adding subsequent information to the File.meta JSON document can be done with JSON merging, but that will certainly take more careful thought than simply INSERT a new row to FileLogs.

  • How will you need to query the statusInfo in the future? If you need to search for entries with certain status, or which one is the most recent, or which one had a status entry on a given date, this could be tricky to search the JSON document, whereas it is much easier if you keep the status info in rows of FileLogs. You should make sure you know how to do all the queries you need to do before you commit to using JSON.

Bill Karwin
  • 538,548
  • 86
  • 673
  • 828
  • Thank you for the response - I'll check it out once! For your questions/concerns: 1) space is not an issue right now. 2) you are right but it's ok for my usecase. 3) I don't have to query for this - I'm only really interested in the last status - but in the case of an error I'd like to have the meta-data to know why an error occured. But only for debugging. The "normal" SELECT id,lastStatusId,... FROM Files would not include the meta field. And finally the reason I wanted to get rid of it was that I have a super deeply nested data-structure and with this I can get rid of the last level – Philipp Kyeck Mar 28 '23 at 17:49
  • 1
    Fair enough, I just want you to think about the consequences of using JSON, and it sounds like you have. I'll leave those remarks there for the benefit of other readers who are considering storing aggregate data as a JSON document. – Bill Karwin Mar 28 '23 at 18:56
  • 1
    Is working as expected - thanks! Only thing I had to change: use `JSON_MERGE_PATCH()` in the SET because Files.meta could already have data in it – Philipp Kyeck Mar 29 '23 at 07:33