0

Trying to make sense of a database written in json format. I managed to convert the main tables into csv format however there are columns within the tables themselves that are written in json format.

The table in question describes soccer matches and the attribute I am having trouble with gives details (such as line up, goals scored etc) for the two teams in play. The attribute provides two main key/value pairs, with the key values being the id's for the teams playing in the match, and so keys change from row to row.

I have managed to access some information by writing the id manually in a query such as below (using team_id 1609):

SELECT json_extract(teams_data, "$.1609") AS trial
FROM Matches_England

(the column in question is called teams_data and the table in question is called Matches_England)

however this is obviously not feasible for every single game.

I am looking for a way to access these 2 values for every single row in SQLite.

This is an example of the row in question:

{**"1609"**: {"scoreET": 0, "coachId": 7845, "side": "home", "teamId": 1609, "score": 4, "scoreP": 0, "hasFormation": 1, "formation": {"bench": [{"playerId": 20612, "ownGoals": "0", "redCards": "0", "goals": "0", "yellowCards": "0"}, {"playerId": 25662, "ownGoals": "0", "redCards": "0", "goals": "0", "yellowCards": "0"}, {"playerId": 7864, "ownGoals": "0", "redCards": "0", "goals": "0", "yellowCards": "0"}, {"playerId": 230020, "ownGoals": "0", "redCards": "0", "goals": "0", "yellowCards": "0"}, {"playerId": 26010, "ownGoals": "0", "redCards": "0", "goals": "1", "yellowCards": "0"}, {"playerId": 7879, "ownGoals": "0", "redCards": "0", "goals": "0", "yellowCards": "0"}, {"playerId": 7870, "ownGoals": "0", "redCards": "0", "goals": "1", "yellowCards": "0"}], "lineup": [{"playerId": 370224, "ownGoals": "0", "redCards": "0", "goals": "0", "yellowCards": "0"}, {"playerId": 120339, "ownGoals": "0", "redCards": "0", "goals": "0", "yellowCards": "0"}, {"playerId": 7945, "ownGoals": "0", "redCards": "0", "goals": "1", "yellowCards": "0"}, {"playerId": 14869, "ownGoals": "0", "redCards": "0", "goals": "0", "yellowCards": "0"}, {"playerId": 25413, "ownGoals": "0", "redCards": "0", "goals": "1", "yellowCards": "0"}, {"playerId": 7868, "ownGoals": "0", "redCards": "0", "goals": "0", "yellowCards": "0"}, {"playerId": 3560, "ownGoals": "0", "redCards": "0", "goals": "0", "yellowCards": "0"}, {"playerId": 167145, "ownGoals": "0", "redCards": "0", "goals": "0", "yellowCards": "0"}, {"playerId": 3319, "ownGoals": "0", "redCards": "0", "goals": "0", "yellowCards": "0"}, {"playerId": 7882, "ownGoals": "0", "redCards": "0", "goals": "0", "yellowCards": "0"}, {"playerId": 49876, "ownGoals": "0", "redCards": "0", "goals": "0", "yellowCards": "0"}], "substitutions": [{"playerIn": 26010, "playerOut": 370224, "minute": 67}, {"playerIn": 7870, "playerOut": 120339, "minute": 67}, {"playerIn": 7879, "playerOut": 7945, "minute": 75}]}, "scoreHT": 2}, **"1631"**: {"scoreET": 0, "coachId": 333782, "side": "away", "teamId": 1631, "score": 3, "scoreP": 0, "hasFormation": 1, "formation": {"bench": [{"playerId": 119630, "ownGoals": "0", "redCards": "0", "goals": "0", "yellowCards": "0"}, {"playerId": 8498, "ownGoals": "0", "redCards": "0", "goals": "0", "yellowCards": "0"}, {"playerId": 350976, "ownGoals": "0", "redCards": "0", "goals": "0", "yellowCards": "0"}, {"playerId": 8066, "ownGoals": "0", "redCards": "0", "goals": "0", "yellowCards": "0"}, {"playerId": 285508, "ownGoals": "0", "redCards": "0", "goals": "0", "yellowCards": "0"}, {"playerId": 217078, "ownGoals": "0", "redCards": "0", "goals": "0", "yellowCards": "0"}, {"playerId": 283142, "ownGoals": "0", "redCards": "0", "goals": "0", "yellowCards": "0"}], "lineup": [{"playerId": 14763, "ownGoals": "0", "redCards": "0", "goals": "1", "yellowCards": "0"}, {"playerId": 192748, "ownGoals": "0", "redCards": "0", "goals": "0", "yellowCards": "0"}, {"playerId": 8013, "ownGoals": "0", "redCards": "0", "goals": "0", "yellowCards": "0"}, {"playerId": 8480, "ownGoals": "0", "redCards": "0", "goals": "0", "yellowCards": "0"}, {"playerId": 8653, "ownGoals": "0", "redCards": "0", "goals": "0", "yellowCards": "0"}, {"playerId": 149019, "ownGoals": "0", "redCards": "0", "goals": "0", "yellowCards": "0"}, {"playerId": 8488, "ownGoals": "0", "redCards": "0", "goals": "0", "yellowCards": "94"}, {"playerId": 14853, "ownGoals": "0", "redCards": "0", "goals": "0", "yellowCards": "0"}, {"playerId": 265366, "ownGoals": "0", "redCards": "0", "goals": "0", "yellowCards": "0"}, {"playerId": 12829, "ownGoals": "0", "redCards": "0", "goals": "2", "yellowCards": "0"}, {"playerId": 26150, "ownGoals": "0", "redCards": "0", "goals": "0", "yellowCards": "0"}], "substitutions": [{"playerIn": 217078, "playerOut": 14763, "minute": 72}, {"playerIn": 285508, "playerOut": 192748, "minute": 82}, {"playerIn": 283142, "playerOut": 8013, "minute": 88}]}, "scoreHT": 2}}

In this case the two team id's taking part in the match are 1631 and 1609. It would even suffice if I would be able to find a way to separate these two key/value pairs into two separate columns including only the two separate values (as the team id is again nested within the original value so I can still obtain it from there).

oz123
  • 27,559
  • 27
  • 125
  • 187
  • 2
    Instead of storing a huge chunk of text, I'd decompose that into however many tables it takes to normalize all the relationships and data. Easier to work with – Shawn Jun 03 '20 at 15:59
  • That actually worked. I normalized everything with python. Thanks! – Rebecca Vella Jun 05 '20 at 17:13

1 Answers1

0

Use json_each to turn each key/value pair into a row.

select *
from json_each('{"123": {"a": 23, "b": 42}, "456": {"a": 5, "b": 7}}');

key  value            type    atom  id  parent  fullkey  path
---  ---------------  ------  ----  --  ------  -------  ----
123  {"a":23,"b":42}  object        2           $.123    $   
456  {"a":5,"b":7}    object        8           $.456    $   

Then work with the value column.

select json_extract(things.value, '$.a') as a
from (
  select * from json_each('{"123": {"a": 23, "b": 42}, "456": {"a": 5, "b": 7}}')
) things;

a 
--
23
5 

There's also json_tree to work with nested data more directly.

This is obviously quite awkward and will be quite slow, it's all string processing. If at all possible do this once and normalize the data into tables and rows.

Schwern
  • 153,029
  • 25
  • 195
  • 336