0

I am making a search engine for full text search, and i have a problem in performance when displaying results with description. I made the results for the current query, but the lack of performance is when i try to get the text and highlight the part where the keyword is. I use pdf, txt, doc, docs, html and etc. So my search engine works like:

  • I have a db table where i store the document text
  • I have a db table where i index the text with it's frequency

Is this scenario good at all. I have to search the index and get the document, parse the text, get the sentences, filter the sentences with the keyword. The performance for searching without description is:

**Крушевското Востание 1903** 0,00518989562988
**Даме Груев** 0,00394678115845
**Даме Груев и Гоце Делчев**  0,0916090011597
**Државен празник Илинден** 0,0072648525238
**Даме** 0,00195503234863
**Александар Македонски** 0,0423209667206
**Бранко Црвенковски и Никола Груевски** 0,0233609676361
**СДСМ и ВМРО-ДПМНЕ** 0,0295231342316
**Македонија** 0,0435738563538
**Никола Груевски и Македонија** 0,0451180934906

The search keywords are in my native language, the collection of documents is 3679. With a description tag of the sentences i have 10x-20x times slower displaying of results. (like 2-3 seconds). The search is made in python.

Any suggestion for it?

badc0re
  • 3,333
  • 6
  • 30
  • 46

1 Answers1

2

I really suggest you to have a look at projects like Elastic search and Solr (both based on Lucene), they both support what you want to do (full text search, results highlight...) and much more.

Tommaso Barbugli
  • 11,781
  • 2
  • 42
  • 41