2

I'm using Apache Solr with Solarium client library for PHP.

The problem is special character, a dash (-).

When I have a dash in my search query, I don't get any matches.

I tried to solve this by using Solarium_Query_Helper::escapeTerm(). But I don't get any matches again. The dash is being escaped with a backslash \.

What is the solution for this problem?

I was thinking about escaping all fields when indexing, but that doesn't sound like a good idea.

Here is the part of my schema.xml:

 <fieldType name="text_general" class="solr.TextField" positionIncrementGap="100">
  <analyzer type="index">
    <tokenizer class="solr.StandardTokenizerFactory"/>
    <filter class="solr.StopFilterFactory" ignoreCase="true" words="stopwords.txt" enablePositionIncrements="true" />
    <filter class="solr.LowerCaseFilterFactory"/>
  </analyzer>
  <analyzer type="query">
    <tokenizer class="solr.StandardTokenizerFactory"/>
    <filter class="solr.StopFilterFactory" ignoreCase="true" words="stopwords.txt" enablePositionIncrements="true" />
    <filter class="solr.SynonymFilterFactory" synonyms="synonyms.txt" ignoreCase="true" expand="true"/>
    <filter class="solr.LowerCaseFilterFactory"/>
  </analyzer>
</fieldType>
...
<fields>
...
  <field name="myfield" type="text_general" indexed="true" stored="true" />
</fields>
...
<defaultSearchField>text</defaultSearchField>
<copyField source="myfield" dest="text" />
umpirsky
  • 9,902
  • 13
  • 71
  • 96

1 Answers1

4

The are some special characters that you need to escape. They are listed here:

+ - && || ! ( ) { } [ ] ^ " ~ * ? : \

You can escape them using the backslash \. I'm not a Solarium expert but the function you're using seems to do what needs to be done. Probably there is another reason why you don't get the expected matches back.

javanna
  • 59,145
  • 14
  • 144
  • 125
  • Yes, I'm looking for that reason and solution for it. Thanks. – umpirsky Oct 22 '12 at 10:31
  • I guess you need to provide more information then. With the information provided it's hard to give you a solution. if your query is correctly escaped it all depends on how your documents are indexed. – javanna Oct 22 '12 at 11:09
  • I didn't do anything special. I indexed documents as they are, without escaping. If document field contains a dash, I indexed them as they are (e.g. `foo-bar`). – umpirsky Oct 22 '12 at 11:56
  • Can you post your schema.xml fieldType definition for the field that you're querying? – javanna Oct 22 '12 at 13:26
  • Had a look at it. The StandardTokenizerFactory could do some magic but you're applying it at both query and index time, so it shouldn't be a problem. Have you tried having a look at the solr analysis page to check how your field get indexed and queried? – javanna Oct 22 '12 at 14:26
  • Yes, here is the result http://i.imgur.com/Hzobo.png Looks like it is devided on dash for some reason. I get no matches with query like `*sasa\-b*` and that is my problem. – umpirsky Oct 23 '12 at 08:45
  • If you don't want to tokenize on dashes you need to change the tokenizer, for example using WhitespaceTokenizer. – javanna Oct 23 '12 at 12:47