I am trying to make a search engine for my website. How should I design the table which keeps the list of indexed words.
Earlier I thought something like this:
Table: tbl_indexedwords
has 2 columns iw_wordid
and iw_word
.
Table: tbl_wordoccurrence
has 4 columns wo_occurrenceid
, wo_wordid
, wo_pageid
, wo_numberofoccurrences
.
Now, this design will not work well if the user enters more than two words in the search box. Suppose foo bar
. Even if foo
and bar
both are present in the table tbl_indexedwords
and corresponding details are in the tbl_wordoccurrence
, my search engine script would rank the results where it sees maximum wo_numberofoccurrences
for either foo
or bar
. It will not see whether foo
and bar
are present next to each other as there is no column for order of occurrence of the words. I hope I am clear with what I am saying here.
Another idea could be to make the table tbl_wordoccurrence
of 3 columns. Forget about wo_numberofoccurences
and store each word in the page with unique wo_occurrenceid
. Now, this would solve my problem as I know the order of occurrence of the words. if wo_occurrenceid
of some word is wo_occurrenceid
+1 or wo_occurrenceid
-1 of some other word then, these two occur side by side.
The problem with this design is that it would take up lots of space. I have lots of content for my website. I think this approach would make it slow(not sure, though). Is there any other design that would help me? Or will I have to go with the second one? I am sure the first one is not gonna work, so discarding it.