search engine ideas for description of results

Question

I am making a search engine for full text search, and i have a problem in performance when displaying results with description. I made the results for the current query, but the lack of performance is when i try to get the text and highlight the part where the keyword is. I use pdf, txt, doc, docs, html and etc. So my search engine works like:

I have a db table where i store the document text
I have a db table where i index the text with it's frequency

Is this scenario good at all. I have to search the index and get the document, parse the text, get the sentences, filter the sentences with the keyword. The performance for searching without description is:

**Крушевското Востание 1903** 0,00518989562988
**Даме Груев** 0,00394678115845
**Даме Груев и Гоце Делчев**  0,0916090011597
**Државен празник Илинден** 0,0072648525238
**Даме** 0,00195503234863
**Александар Македонски** 0,0423209667206
**Бранко Црвенковски и Никола Груевски** 0,0233609676361
**СДСМ и ВМРО-ДПМНЕ** 0,0295231342316
**Македонија** 0,0435738563538
**Никола Груевски и Македонија** 0,0451180934906

The search keywords are in my native language, the collection of documents is 3679. With a description tag of the sentences i have 10x-20x times slower displaying of results. (like 2-3 seconds). The search is made in python.

Any suggestion for it?

score 2 · Answer 1 · answered May 26 '12 at 09:56

2

I really suggest you to have a look at projects like Elastic search and Solr (both based on Lucene), they both support what you want to do (full text search, results highlight...) and much more.

answered May 26 '12 at 09:56

Tommaso Barbugli

11,781
2
42
41

i know about them, i was looking into lucene, but i thought if someone have a idea about my case. – badc0re May 26 '12 at 10:11
@badc0re if you add some more detail about the highlighting part I can have a look. – Tommaso Barbugli May 26 '12 at 11:54
It isn't just a single class, there are a lot of them. – badc0re May 27 '12 at 06:41

search engine ideas for description of results

1 Answers1