1

I have raw data in JSON as follows:

{
    "id": 1,
    "tags": [{
        "category": "location",
        "values": ["website", "browser"]
    },{
        "category": "campaign",
        "values": ["christmas_email"]
    }]
}, 
{
    "id": 2,
    "tags": [{
        "category": "location",
        "values": ["website", "browser", "chrome"]
    }]
},
{
    "id": 3,
    "tags": [{
        "category": "location",
        "values": ["website", "web_view"]
    }]
}

The tag category and its values are dynamically generated and are not known beforehand. I need to load this data into an RDBMS table and then later make queries to the data. The queries may be as follows:

  • Extract all rows where location has values "website" and "browser". The output of this query should return rows with id 1 and 2.

I need some help in modelling this into a table schema to support such queries. I was thinking of tables as:

Table 1: MAIN
Columns: ID, TAG_LIST_ID
Row1:    1   TL1
Row2:    2   TL2
Row3:    3   TL3

Table 2: TAGS
Columns: TAG_ID, TAG_CATEGORY, TAG_VALUE
Row1:    TID1    location      website
Row2:    TID2    location      browser
Row3:    TID3    location      chrome
Row4:    TID4    location      web_view
Row5:    TID5    campaign      christmas_email

Table 3: TAG_MAPPING
Columns: TAG_MAPPING_ID, TAG_LIST_ID, TAG_ID
Row1:    TMID1           TL1          TID1
Row2:    TMID2           TL1          TID2
Row3:    TMID3           TL1          TID5
Row4:    TMID4           TL2          TID1
Row5:    TMID5           TL2          TID2
Row6:    TMID6           TL2          TID3
Row7:    TMID7           TL3          TID1
Row8:    TMID8           TL3          TID4

Now to query all rows where location has values "website" and "browser", I could write

SELECT * from MAIN m, TAGS t, TAG_MAPPING tm
WHERE m.TAG_LIST_ID=tm.TAG_LIST_ID AND
tm.TAG_ID = t.TAG_ID AND
t.TAG_CATEGORY = "location" AND
(t.TAG_VALUE="website" OR t.TAG_VALUE="browser")

However this will return all the three rows; changing the OR condition to AND will return no rows. What is the right way to design the schema?

Any pointers appreciated.

Swaranga Sarma
  • 13,055
  • 19
  • 60
  • 93

1 Answers1

1

Just replace the OR by IN and a counter:

SELECT tm.TAG_LIST_ID, count(1) as cnt
 FROM MAIN m, TAGS t, TAG_MAPPING tm
WHERE tm.TAG_LIST_ID= m.TAG_LIST_ID 
  AND tm.TAG_ID = t.TAG_ID 
  AND t.TAG_CATEGORY = "location" AND
  AND t.TAG_VALUE IN ("website","browser")
GROUP by  tm.TAG_LIST_ID
having count(1) > 1  -- should be greater than 1 because you are looking for 2 words. This values change according the number of words.
Walter_Ritzel
  • 1,387
  • 1
  • 12
  • 16