I'm using the full text search feature from Postgres and for the most part it works fine.
I have a column in my database table called documentFts
that is basically the ts_vector
version of the body
field, which is a text column, and that's indexed with GIN index.
Here's my query:
select
count(*) OVER() AS full_count,
id,
url,
(("urlScore" / 100) + ts_rank("documentFts", websearch_to_tsquery($4, $1))) as "finalScore",
ts_headline(\'english_unaccent\', title, websearch_to_tsquery($4, $1)) as title,
ts_headline(\'english_unaccent\', body, websearch_to_tsquery($4, $1)) as body,
"possibleEncoding",
"responseYear"
from "Entries"
where
"language" = $3 and
"documentFts" @@ websearch_to_tsquery($4, $1)
order by (("urlScore" / 100) + ts_rank("documentFts", websearch_to_tsquery($4, $1))) desc limit 20 offset $2;
The dictionary is english_unaccent
because I created one based on english
that uses the unaccent
extension by using:
CREATE TEXT SEARCH CONFIGURATION english_unaccent (
COPY = english
);
ALTER TEXT SEARCH CONFIGURATION english_unaccent
ALTER MAPPING FOR hword, hword_part, word WITH unaccent,
english_stem;
I did the same for other languages.
And then I did this to my Entries db:
ALTER TABLE "Entries"
ADD COLUMN "documentFts" tsvector;
UPDATE
"Entries"
SET
"documentFts" = (setweight(to_tsvector('english_unaccent', coalesce(title)), 'A') || setweight(to_tsvector('english_unaccent', coalesce(body)), 'C'))
WHERE
"language" = 'english';
I have a column in my table with the language of the entry, hence the "language" = 'english'
.
So, the problem I'm having is that for words like animal
, anime
or animation
, they all go into the vector as anim
, which means that if I search for any of those words I get results with all of those variations.
That returns a HUGE dataset that causes the query to be quite slow compared to searches that return fewer items. And also, if I search for Anime
, my first results contain Animal
, Animated
and the first result that has the word Anime
is the 12th one.
Shouldn't animation
be transformed to animat
in the vector and animal
just be animal
as the other variations for it are animals
or animalia
?
I've been searching for a solution to this without much luck, is there any way I can improve this, I'm happy to install extensions, reindex the column or whatever.