I have a huge HTML file with text, tables and images (with alt info). I have a full text search function only for this file, but at the moment I use a strict way with string comparison. I want to improve the function and return the top 5 paragraphs (<p></p>
), tables or images sorted in base of a query.
A few problems I have now:
Example 1 (misspelling):
Query: "sta**kc**overflow"
Text: "....this is stackoverflow...."
Example 2 (strict comparison):
Query: "full text searching"
Text: "...full searching..."
I have made a research for ready libraries in Python and I found elasticsearch and Whoosh but it is hard to find an example in documentation for HTML full text search. Do you have any example or another library that you could suggest?