1

can you help me?

I am implementing Hibernate Search, to retrieve results for a global search on a localized website (portuguese and english content)

To do this, I have followed the steps indicated on the Hibernate Search docs: http://docs.jboss.org/hibernate/search/4.5/reference/en-US/html_single/#d0e4141

Along with the specific configuration in the entity itself, I have implemented a "LanguageDiscriminator" class, following the instructions in this doc.

Because I am not getting exactly the results I was expecting (e.g. my entity has the text "Capuchinho" stored, but when I search for "capucho" I get no hits), I have decided to try and debug the execution, and try to understand if the Analyzers which I have configured are being used at all.

When creating a new record for the entity in the database, I can see that the "getAnalyzerDefinitionName()" method from the "LanguageDiscriminator" gets called. Great. But the same does not happen when I execute a search. Can anyone explain me why?

I am posting the key parts of my code below. Thanks a lot for any feedback!

This is one entity I want to index

@Entity
@Table(name="NEWS_HEADER")
@Indexed
@AnalyzerDefs({
@AnalyzerDef(name = "en",
        tokenizer = @TokenizerDef(factory =     StandardTokenizerFactory.class),
        filters = {
            @TokenFilterDef(factory = LowerCaseFilterFactory.class),
            @TokenFilterDef(factory = SnowballPorterFilterFactory.class, 
                            params = {@Parameter(name="language", value="English")}
            )
        }
),
@AnalyzerDef(name = "pt",
        tokenizer = @TokenizerDef(factory = StandardTokenizerFactory.class),
        filters = {
            @TokenFilterDef(factory = LowerCaseFilterFactory.class),
            @TokenFilterDef(factory = SnowballPorterFilterFactory.class, 
                            params = {@Parameter(name="language", value="Portuguese")}
            )
        }
)
})
public class NewsHeader implements Serializable {

static final long serialVersionUID = 20140301L;

private int         id;
private String          articleHeader;
private String          language;
private Set<NewsParagraph>  paragraphs = new HashSet<NewsParagraph>();

/**
 * @return the id
 */
@Id
@Column(name="ID")
@GeneratedValue(strategy=GenerationType.AUTO)
@DocumentId
public int getId() {
    return id;
}
/**
 * @param id the id to set
 */
public void setId(int id) {
    this.id = id;
}
/**
 * @return the articleHeader
 */
@Column(name="ARTICLE_HEADER")
@Field(index=Index.YES, store=Store.NO)
public String getArticleHeader() {
    return articleHeader;
}
/**
 * @param articleHeader the articleHeader to set
 */
public void setArticleHeader(String articleHeader) {
    this.articleHeader = articleHeader;
}
/**
 * @return the language
 */
@Column(name="LANGUAGE")
@Field
@AnalyzerDiscriminator(impl=LanguageDiscriminator.class)
public String getLanguage() {
    return language;
}
...
}

This is my LanguageDiscriminator class

public class LanguageDiscriminator implements Discriminator {

@Override
public String getAnalyzerDefinitionName(Object value, Object entity, String field) {

    String result = null;

    if (value != null) {
        result = (String) value;
    }
    return result;
}

}

This is my search method present in my SearchDAO

public List<NewsHeader> searchParagraph(String patternStr) {

    Session session = null;

    Transaction tx;

    List<NewsHeader> result = null;

    try {
        session = sessionFactory.getCurrentSession();
        FullTextSession fullTextSession = Search.getFullTextSession(session);
        tx = fullTextSession.beginTransaction();

        // Create native Lucene query using the query DSL
        QueryBuilder queryBuilder = fullTextSession.getSearchFactory()
            .buildQueryBuilder().forEntity(NewsHeader.class).get();

        org.apache.lucene.search.Query luceneSearchQuery = queryBuilder
            .keyword()
            .onFields("articleHeader", "paragraphs.content")
            .matching(patternStr)
            .createQuery();

        // Wrap Lucene query in a org.hibernate.Query
        org.hibernate.Query hibernateQuery = 
            fullTextSession.createFullTextQuery(luceneSearchQuery, NewsHeader.class, NewsParagraph.class);

        // Execute search
        result = hibernateQuery.list();

    } catch (Exception xcp) {
        logger.error(xcp);
    } finally {

        if ((session != null) && (session.isOpen())) {
            session.close();
        }
    }
    return result;
}
Hardy
  • 18,659
  • 3
  • 49
  • 65

1 Answers1

2
When creating a new record for the entity in the database, I can see that the "getAnalyzerDefinitionName()" method from the "LanguageDiscriminator" gets called. Great. But the same does not happen when I execute a search. Can anyone explain me why?

The selection of the analyzer is dependent on the state of a given entity, in your case NewsHeader. You are dealing with entity instances during indexing. While querying you don't have entities to start with, you are searching for them. Which analyzer would you Hibernate Search to select for your query?

That said, I think there is a shortcoming in the DSL. It does not allow you to explicitly specify the analyzer for a class. There is ignoreAnalyzer, but that's not what you want. I guess you could create a feature request in the Search issue tracker - https://hibernate.atlassian.net/browse/HSEARCH.

In the mean time you can build the query using the native Lucene query API. However, you will need to know which language you are targeting with your query (for example via the preferred language of the logged in user or whatever). This will depend on your use case. It might be you are looking at the wrong feature to start with.

Hardy
  • 18,659
  • 3
  • 49
  • 65
  • Thanks Hardy. It sounds a bit odd to me that we can choose which Analyzers to use with Hibernate search, but only for indexing purposes and not to execute the search itself. I will try what you suggest though, and try to execute the search via the Lucene API. – Eduardo Mendes Mar 06 '14 at 13:51
  • @EduardoMendes I have exactly the same problem. Have you solved it so far beside using the lucene api? Have you filled in a feature reauest? TIA! – Emi Aug 18 '14 at 18:25