0

I have a db with two tables: pages and tags which are structured like follows:

  • pages: page_id, page_text, page_tags (around 60000 records at any time)
  • tags: tag_id, tag_text
    (around 300000 records at any time)

Each page is associated with a number of tags (using the page_tags column). My question is about pages.page_tags and in particular, which way is the most efficient for storing the aforementioned association?

  1. One way would be to fulltext index page.page_tags and store the text of the associated tags there, for example: apple orange fruit marmalade

  2. a second way would be to also fulltext index page.page_tags but store the ids of the associated tags for example: 132 14 24192 14

  3. a third way would be to make a third table: tag_assoc, structured as follows:

tag_assoc: page_id, tag_id

(where for every tag present in a page there will exist a record with both the ids of the page and the tag)


Which do you think is the most efficient way? Especially concerning:

  • A) search speed for queries like: "fetch me every page that has tags: apple and orange"
  • B) updating of the tables. A new page might arrive in the database quite often. This means that if a new tag is found in some of those pages that doesn't exist in the tags table, I'll have to add it there.

If none of them, what would you suggest?

Alexandros
  • 4,425
  • 4
  • 23
  • 21

2 Answers2

0

If you use fulltext indexes i would doing something as such

table 1 - page

pageid 
name
date
category
... etc etc other page meta data here

table 2 - page_fulltext

pageid
page_title_fulltext 
page_body_fulltext 

take for example page 1 has page_body_fulltext "the quick brown fox jumps of the lazy dog" page 2 has page_body_fulltext "the quick red fox jumps of the lazy brown dog"

doing a fulltext search you can find individual tag words but also find exact strings

ie you can find key words "quick" or "brown" or "fox"

But if someone searches for "quick brown fox" you can do this also.

in your example you would probably look for all 3 words and get both pages back which would be wrong.

also mysql does a great job of handling fulltext searches, what you suggest is basically a similar version of what mysql would do very well on its own

so in the 2 instance you outlined above A) search speeds would be brilliant as it what mysql does natively very well B) my way way quicker as you would not have to check the existence of every keyword you are inserting. Just perform a standard update/insert and let mysql handle the pain of text searcing for you.

my company is using the method i described and it works very well...

also my have the page text and page title in seperate fulltext columns you get the added bonus of being able to score pages with titles containing your keywords higher than pages with body text containing same keywords.

niallo
  • 366
  • 3
  • 5
0

This page, while a bit old, contains decent information about various approaches to tagging schemas and how each effect performance. How you approach the problem is largely dependent on both your current amount of records and how you expect this amount to change going forward.

The schemas: http://www.pui.ch/phred/archives/2005/04/tags-database-schemas.html

Their performance: http://www.pui.ch/phred/archives/2005/06/tagsystems-performance-tests.html

John Kramlich
  • 2,220
  • 17
  • 18