I have a db with two tables: pages and tags which are structured like follows:
- pages: page_id, page_text, page_tags (around 60000 records at any time)
- tags: tag_id, tag_text
(around 300000 records at any time)
Each page is associated with a number of tags (using the page_tags column). My question is about pages.page_tags and in particular, which way is the most efficient for storing the aforementioned association?
One way would be to fulltext index page.page_tags and store the text of the associated tags there, for example: apple orange fruit marmalade
a second way would be to also fulltext index page.page_tags but store the ids of the associated tags for example: 132 14 24192 14
a third way would be to make a third table: tag_assoc, structured as follows:
tag_assoc: page_id, tag_id
(where for every tag present in a page there will exist a record with both the ids of the page and the tag)
Which do you think is the most efficient way? Especially concerning:
- A) search speed for queries like: "fetch me every page that has tags: apple and orange"
- B) updating of the tables. A new page might arrive in the database quite often. This means that if a new tag is found in some of those pages that doesn't exist in the tags table, I'll have to add it there.
If none of them, what would you suggest?