2

I am trying to make a search engine for my website. How should I design the table which keeps the list of indexed words.

Earlier I thought something like this:

Table: tbl_indexedwords has 2 columns iw_wordid and iw_word.

Table: tbl_wordoccurrence has 4 columns wo_occurrenceid, wo_wordid, wo_pageid, wo_numberofoccurrences.

Now, this design will not work well if the user enters more than two words in the search box. Suppose foo bar. Even if foo and bar both are present in the table tbl_indexedwords and corresponding details are in the tbl_wordoccurrence, my search engine script would rank the results where it sees maximum wo_numberofoccurrences for either foo or bar. It will not see whether foo and bar are present next to each other as there is no column for order of occurrence of the words. I hope I am clear with what I am saying here.

Another idea could be to make the table tbl_wordoccurrence of 3 columns. Forget about wo_numberofoccurences and store each word in the page with unique wo_occurrenceid. Now, this would solve my problem as I know the order of occurrence of the words. if wo_occurrenceid of some word is wo_occurrenceid+1 or wo_occurrenceid-1 of some other word then, these two occur side by side.

The problem with this design is that it would take up lots of space. I have lots of content for my website. I think this approach would make it slow(not sure, though). Is there any other design that would help me? Or will I have to go with the second one? I am sure the first one is not gonna work, so discarding it.

Navneet Saini
  • 934
  • 4
  • 16
  • 33
  • I would suggest using `FULLTEXT` capability of your database system (if it provides) or using some ready made frameworks like **[Lucene](http://lucene.apache.org/core/)**, which are mature enough to fullfill your requirement effectively, rather than re-inventing the wheel on your own. – Rumit Parakhiya Jun 12 '13 at 04:26
  • @RumitParakhiya Does Lucene work with php? I thought Its for java! – Navneet Saini Jun 12 '13 at 09:56
  • I really don't have much idea about using Lucene with php, but [this SO Question](http://stackoverflow.com/questions/2010663/lucene-with-php) or [this link](http://lucene.apache.org/solr/) may help you. – Rumit Parakhiya Jun 12 '13 at 11:39

1 Answers1

3

If the contents of your website is on the database (I assume) creating a separate table would not even necessary if you are using FULLTEXT index. If you are using mySQL then it has such capability see the examples here and here. And if you are using MSSQL it has also its own FULLTEXT indexing capability like the example here and here

And if you insist if having a separate table for searching then you could most likely have only one table needed like:

Table  : tbl_wordsoccurrence
Fields : words_id, words 
(and if you like you can include also number_of_occurences and page_id fields)

In the table above you could either store one word like programming or phrases like php programming.

On the other hand if your website is static meaning the content is not saved on a database and therefore changes had to be made manually rather than by regular user input then that's another story.

Edper
  • 9,144
  • 1
  • 27
  • 46