I am using tools like to_tsquery
and to_tsvector
in PostgreSQL to perform a full text search but postgresql has a limited number of supported languages by default. I need my database to support the Georgian language to perform searches in that language.
Where can I find/download the text configuration for that language and how do I apply this configuration to my postgresql instance?
Is there any guide that helps to create this kind of configuration using language dictionaries?
EDIT:
Ok, here is what i got so far.
It was pointed in the comments that I should look for Snowball or Ispell dictionaries.
I found a dictionary for the georgian language by simply googling "georgian" "hunspell"
but now I have a problem creating the text search configuration using this dictionary.
I created a dictionary in postgresql using
create text search dictionary georgian_hunspell (
template = ispell,
DictFile = ka_GE,
AffFile = ka_GE
);
and tested it using
select ts_lexize('georgian_hunspell', 'ვაშლი');
which works fine. But creating a configuration doesn't help. I tried doing this:
CREATE TEXT SEARCH CONFIGURATION georgian_hunspell_configuration (parser = default);
and then this:
ALTER TEXT SEARCH CONFIGURATION georgian_hunspell_configuration
ALTER MAPPING FOR asciiword, asciihword, hword_asciipart,word, hword, hword_part
WITH georgian_hunspell;
but to no avail. I tried testing it using this:
select * from ts_debug('georgian_hunspell_configuration', 'ვაშლი');
The result is
alias | description | token | dictionaries | dictionary | lexemes
-------+---------------+-------+--------------+------------+---------
blank | Space symbols | ვაშლი | {} | |
(1 row)
It says Space symbols
because no parser could extract the tokens from the input? If so, why? Because of the different used alphabet of this language? Should I write my own parser?
How can I make this work?