I have raw data in JSON as follows:
{
"id": 1,
"tags": [{
"category": "location",
"values": ["website", "browser"]
},{
"category": "campaign",
"values": ["christmas_email"]
}]
},
{
"id": 2,
"tags": [{
"category": "location",
"values": ["website", "browser", "chrome"]
}]
},
{
"id": 3,
"tags": [{
"category": "location",
"values": ["website", "web_view"]
}]
}
The tag category and its values are dynamically generated and are not known beforehand. I need to load this data into an RDBMS table and then later make queries to the data. The queries may be as follows:
- Extract all rows where location has values "website" and "browser". The output of this query should return rows with id 1 and 2.
I need some help in modelling this into a table schema to support such queries. I was thinking of tables as:
Table 1: MAIN
Columns: ID, TAG_LIST_ID
Row1: 1 TL1
Row2: 2 TL2
Row3: 3 TL3
Table 2: TAGS
Columns: TAG_ID, TAG_CATEGORY, TAG_VALUE
Row1: TID1 location website
Row2: TID2 location browser
Row3: TID3 location chrome
Row4: TID4 location web_view
Row5: TID5 campaign christmas_email
Table 3: TAG_MAPPING
Columns: TAG_MAPPING_ID, TAG_LIST_ID, TAG_ID
Row1: TMID1 TL1 TID1
Row2: TMID2 TL1 TID2
Row3: TMID3 TL1 TID5
Row4: TMID4 TL2 TID1
Row5: TMID5 TL2 TID2
Row6: TMID6 TL2 TID3
Row7: TMID7 TL3 TID1
Row8: TMID8 TL3 TID4
Now to query all rows where location has values "website" and "browser", I could write
SELECT * from MAIN m, TAGS t, TAG_MAPPING tm
WHERE m.TAG_LIST_ID=tm.TAG_LIST_ID AND
tm.TAG_ID = t.TAG_ID AND
t.TAG_CATEGORY = "location" AND
(t.TAG_VALUE="website" OR t.TAG_VALUE="browser")
However this will return all the three rows; changing the OR condition to AND will return no rows. What is the right way to design the schema?
Any pointers appreciated.