django / haystack / solr simple config - partial field matching issue

Question

I have a simple config of haystack/solr on my django app:

from the models.py of this app:

class device(models.Model):
    ...
    hostname = models.CharField(max_length=45, help_text="The hostname for this device")
    ...

from the search_sites.py of this app:

class devIndex(indexes.SearchIndex):
    '''Haystack class to allow for indexing device objects in TOMS'''
    text = indexes.CharField(document=True, use_template=True)

from templates/search/indexes/systems_management/device_text.txt fo this app (names all jibe)

...
{{ object.hostname }}
...

The Problem:

a system is named static1.foo.com:

if I search for "static", I get results for all static servers ("static" is in their description fields)

if I search for "static1", I get 0 results

if I search for "static1.foo.com" I get results, including this server.

my question is, why is haystack/solr not matching the "static1" query?

score 0 · Answer 1 · answered Feb 05 '10 at 21:43

It's likely an analysis problem. I'd guess you are using the StandardTokenizer in your schema.xml file for this field.

The standard tokenizer tokenizes host names as a single token. (ref: http://www.lucidimagination.com/search/document/CDRG_ch05_5.5.1), so you can only match it with the full host name.

If you want to search by pieces, you'll need to use a different tokenizer. The default text field in the Solr example uses the WhitespaceTokenizer and WordDelimeter filter, which will split the host name. This would allow you to find by the query of 'static1'.

Thanks for the input. I see where that is spelled out in the schema.xml from the config. I edited my schema.xml to read as follows: I then rebuilt my indexes, but I'm still getting no results on a search for a partial hostname: ... static1 ... I can't figure out what (else) I'm missing. — jduncan, Feb 08 '10 at 18:57

score 0 · Answer 2 · answered Dec 22 '11 at 09:22

Solr has many configuration possible. For your use case, you may want to user an edge ngram in you schema.xml. Here is an example:

<fieldType name="edge_ngram" class="solr.TextField" positionIncrementGap="1">
  <analyzer type="index">
    <tokenizer class="solr.WhitespaceTokenizerFactory" />
    <filter class="solr.LowerCaseFilterFactory" />
    <filter class="solr.WordDelimiterFilterFactory" generateWordParts="1" generateNumberParts="1" catenateWords="0" catenateNumbers="0" catenateAll="1" splitOnCaseChange="1"/>
    <filter class="solr.EdgeNGramFilterFactory" minGramSize="4" maxGramSize="15" side="front" />
  </analyzer>
  <analyzer type="query">
    <tokenizer class="solr.WhitespaceTokenizerFactory" />
    <filter class="solr.LowerCaseFilterFactory" />
    <filter class="solr.WordDelimiterFilterFactory" generateWordParts="1" generateNumberParts="1" catenateWords="0" catenateNumbers="0" catenateAll="1" splitOnCaseChange="1"/>
  </analyzer>

Use this example and tweak it a little bit untill it returns the desired results.

django / haystack / solr simple config - partial field matching issue

2 Answers2