2

I would like to deal with apostrophes in Lucene. Let's give an example. I have the following sentence : "L'arbre est vert". I would like to know how can I create a query on the word "arbre" 'without the apostrophe). With the StandardAnalyzer I need to write "L'arbre" to have a positive answer.

Note that there is one question here : Lucene Indexing to ignore apostrophes. But as I am quite new in Lucene I would like to have an example (code snippet that works in Lucene 5.3).

Community
  • 1
  • 1
El pupi
  • 458
  • 1
  • 3
  • 13

2 Answers2

1

It looks like you need something with more robust analysis of the French language. I would consider using FrenchAnalyzer. StandardAnalyzer is designed to provide a passable language-agnostic analysis. If you want more intelligent linguistic analysis of a particular language, you should look to the analyzer for that language.

For "L'arbre est vert", StandardAnalyzer tokenizes it into:

  • l'arbre
  • est
  • vert

Where FrenchAnalyzer gives you:

  • arbr
  • vert
femtoRgon
  • 32,893
  • 7
  • 60
  • 87
  • The problem is that I have to index multi languages. Not only french. So this solution doesn't fit my use case. For instance in english we can have "a woman's hat". I would like that the query "woman" give me a positive answer. – El pupi Jan 31 '16 at 12:48
0

As pointed out by @femtoRgon you need to address this need with more appropriate analysis. You can either change the analyzer of a field depending on the language of a particular document/query, or you can choose a more advanced strategy as using language specific indices or fields.

Have a look at Multilingual Search using Lucene for an overview of the possible strategies.

Daniel Schneiter
  • 1,843
  • 1
  • 13
  • 19