17

Is it possible to store the synonyms for elasticsearch in the index? Or is it possible to get the synonym list from a database like couchdb? I'd like to add synonyms dynamically to elasticsearch via the REST-API.

Medrod
  • 986
  • 7
  • 17

4 Answers4

16

There are two approaches when working with synonyms :

  • expanding them at indexing time,
  • expanding them at query time.

Expanding synonyms at query time is not recommended since it raises issues with :

  • scoring, since synonyms have different document frequencies,
  • multi-token synonyms, since the query parser splits on whitespaces.

More details on this at http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters#solr.SynonymFilterFactory (on Solr wiki, but relevant for ElasticSearch too).

So the recommended approach is to expand synonyms at indexing time. In your case, if the synonym list is managed dynamically, it means that you should re-index every document which contains a term whose synonym list has been updated so that scoring remains consistent between documents analyzed pre and post update. I'm not saying that it is not possible but it requires some work and will probably raise performance issues with synonyms which have a high frequency in your index.

jpountz
  • 9,904
  • 1
  • 31
  • 39
  • Thanks that is a helpful hint if I can bring changes in the synonym list to the index. But is is possible to hold the synonym list in a database or in lucene index? – Medrod Sep 02 '11 at 08:15
  • 1
    Not by configuration only, Elasticsearch code expects the synonym map to come either from its settings or from a text file. Here are the pieces of code which trigger the instantiation of the synonym filter : https://github.com/elasticsearch/elasticsearch/blob/master/modules/elasticsearch/src/main/java/org/elasticsearch/index/analysis/AnalysisModule.java#L390 and https://github.com/elasticsearch/elasticsearch/blob/master/modules/elasticsearch/src/main/java/org/elasticsearch/index/analysis/SynonymTokenFilterFactory.java#L51 – jpountz Sep 02 '11 at 09:24
  • 1
    You can check elastic search [synonym filter documentation](http://www.elasticsearch.org/guide/reference/index-modules/analysis/synonym-tokenfilter.html) which has been update recently. It contains examples of both file and config-nested synonyms as well as supported synonyms formats. – Lukas Vlcek Nov 11 '11 at 18:18
  • This is a really helpful tip, I've swapped to only expanding based on synonyms at index time now. – Luke Cousins May 30 '14 at 15:01
  • @LukasVlcek your link is 404, I think the correct one is https://www.elastic.co/guide/en/elasticsearch/reference/current/analysis-synonym-tokenfilter.html – Christophe Blin Nov 03 '16 at 09:39
  • 2
    1. Choosing between query time and index time expansion is not that obvious and there is not a right or wrong answer. 2. Query time expansion does not have an impact on scoring (on the contrary, index time has (https://www.elastic.co/guide/en/elasticsearch/guide/current/synonyms-expand-or-contract.html). 3. You can work with multi-token synonyms at query time by making your query a bit less efficient and check for synonyms (you can make it even at constant time) before splitting. – Michael Sep 26 '17 at 11:56
  • You cannot say anymore that expanding synonyms at index time is recommended. The opposite is true, see [the Elasticsearch documentation.](https://www.elastic.co/guide/en/elasticsearch/reference/current/analysis-synonym-graph-tokenfilter.html). You could use the deprecated Synonym Token Filter at index time but then you don't get multi word synonyms. – Suzana Nov 15 '21 at 20:15
3

There are few new solutions now to those proposed in other answers few years ago. The two main approaches implemented as plugins:

  1. The file-watcher-synonym filter is a plugin that can periodically reload synonyms every given numbers of seconds, as defined by user.
  2. The refresh-token-plugin allows a real-time update of the index. However, this plugin aparrently has some problems, which stem from the fact that elasticsearch is unable to distinguish between analyzers at search time only from those used at index time.

Good discussion on this subject can be found on the elastisearch github ticket system: https://github.com/brusic/refresh-token-filters

Datageek
  • 25,977
  • 6
  • 66
  • 70
  • 1
    For anyone reading this, There are now "managed" synonym filters, which query synonyms over HTTP. You still need to reindex when you change synonyms if you are adding them at index time. – Ben DeMott Jun 05 '17 at 21:49
  • 1
    As of 20-Jan-2019, the File-Watcher plugin is very outdated, and the developer has been inactive for a long time. – Typewar Jan 19 '19 at 23:13
1

It isn't too painful in elasticsearch to update the synonym list. It can be done by opening and closing You could have it driven from anywhere, but need some of your own infrastructure. It'd work like this:

  • You want an alias pointing at your current index
  • Sync down a new index file to your servers
  • Create a new index with a custom analyzer that uses the new index
  • Rebuild the content from current index to new index
  • Repoint index alias from current to new index
ppearcy
  • 2,732
  • 19
  • 21
0

In 2021, just expand synonyms at query time using a specific search analyzer and use the Reload analyzer API:

POST /my-index/_reload_search_analyzers 

The synonym graph token filter must have set updatable to true:

  "my-synonyms": {
    "type": "synonym_graph",
    "synonyms_path": "my-synonyms.txt",
    "updateable": true
  }

Besides, you should probably expand synonyms at query time anyway. Why?

  1. Chances are that you have too much data to reindex every night or so.
  2. Elasticsearch does not allow the Synonym Graph Filter for an index analyzer, only the deprecated Synonym Filter which does not handle multi word synonyms correctly.
Suzana
  • 4,251
  • 2
  • 28
  • 52