26

I have a rather simple SOLR structure, that hold three different fields:

id, text and tags

in the schema.xml I set the following

<uniqueKey>id</uniqueKey>
<defaultSearchField>text</defaultSearchField>
<solrQueryParser defaultOperator="AND"/>
<copyField source="tags" dest="text"/>

However, when I search a word that only appears as a tag, then the document is not found.

My question here is: does copyField happen before any analyzer runs (index and query) as described here or just before the query analyzer?


EDIT

the analyzer def:

<fieldType name="text" class="solr.TextField" positionIncrementGap="100">
    <analyzer type="index">
        <tokenizer class="solr.WhitespaceTokenizerFactory" />
        <filter class="solr.WordDelimiterFilterFactory" generateWordParts="1" generateNumberParts="1" catenateWords="1" catenateNumbers="1" catenateAll="0" splitOnCaseChange="1" preserveOriginal="1" />
        <filter class="solr.StopFilterFactory" ignoreCase="true" words="stopwords.txt" enablePositionIncrements="true" />
        <filter class="solr.LowerCaseFilterFactory" />              
        <filter class="solr.SnowballPorterFilterFactory" language="German" />
        <filter class="solr.RemoveDuplicatesTokenFilterFactory"/>
    </analyzer>
    <analyzer type="query">
        <tokenizer class="solr.WhitespaceTokenizerFactory"/>
        <filter class="solr.WordDelimiterFilterFactory" generateWordParts="1" generateNumberParts="1" catenateWords="1" catenateNumbers="1" catenateAll="0" splitOnCaseChange="1" preserveOriginal="1" />
        <filter class="solr.StopFilterFactory" ignoreCase="true" words="stopwords.txt" enablePositionIncrements="true" />
        <filter class="solr.LowerCaseFilterFactory" />              
        <filter class="solr.SnowballPorterFilterFactory" language="German" />
        <filter class="solr.RemoveDuplicatesTokenFilterFactory"/>
    </analyzer>
</fieldType>

and the field-type definitions (they are pretty much as the default configs):

<fieldType name="string" class="solr.StrField" sortMissingLast="true" omitNorms="true"/>
<fieldType name="int" class="solr.TrieIntField" precisionStep="0" omitNorms="true" positionIncrementGap="0"/>

and last the field definitions:

<fields>
    <field name="id" type="string" indexed="true" stored="true" required="true" />
    <field name="text" type="text" indexed="true" stored="false" multiValued="true" />
    <field name="tags" type="text" indexed="false" stored="false" />
</fields>
<uniqueKey>id</uniqueKey>
<defaultSearchField>text</defaultSearchField>
<solrQueryParser defaultOperator="AND"/>
<copyField source="tags" dest="text"/>
rae1
  • 6,066
  • 4
  • 27
  • 48
harpax
  • 5,986
  • 5
  • 35
  • 49
  • 1
    Be carefull with defaultSearchField : "It is preferable to not use or rely on this setting; instead the request handler or query LocalParams for a search should specify the default field(s) to search on. This setting here can be omitted and it is being considered for deprecation." From the documentation : https://wiki.apache.org/solr/SchemaXml#The_Default_Search_Field – Erowlin Mar 18 '13 at 09:40

3 Answers3

33

The copyField is done when a document is indexed, so it is before the index analyzer. It is really like you had put the same input text in two different fields. But after that, it all depends on the analyzers you defined for both fields.

Pascal Dimassimo
  • 6,908
  • 1
  • 37
  • 34
  • thanks for your answer. I did not define any analyzer for the tags field because it wasn't necassary to score the tags different from the normal text. I just copy the contents of tags to text and let the indexer run over that field -> I do something wrong as it doesn't work – harpax Jan 04 '11 at 16:59
  • 1
    Please post your fields definition. – Pascal Dimassimo Jan 04 '11 at 17:37
  • ok, first thing first: when you browse your content in solr, do you see data in your "text" field? – Pascal Dimassimo Jan 05 '11 at 15:57
  • you mean the `Schema Browser`? Yes there are ~2500 docs in the text field. The text query is working as expected as mentioned above - just the tags are not in there – harpax Jan 05 '11 at 17:17
  • I mean doing a "/select?q=*:*" query. Do you have data in both "tags" and "text" fields. Also, check your indexation process. Are you sure you are adding data to the "tags" field? If you are doing a copyfield from "tags" to "text", you have to add data to "tags". – Pascal Dimassimo Jan 05 '11 at 19:19
  • oh my .. your last comment did the job.. I accidently changed the path for the SOLR index call so that new entries were not added .. *blush* – harpax Jan 06 '11 at 10:30
  • Can I safely say that a copyField should always have multiValued=True? – zengr May 09 '13 at 18:25
  • no, you could copy a single-value field into another single-value field – Pascal Dimassimo May 09 '13 at 18:57
  • @PascalDimassimo thanks. I am still struggling with getting copyField to work with with solr4. I will ask a different question maybe. http://lucene.472066.n3.nabble.com/Not-able-to-see-newly-added-copyField-in-the-response-indexing-is-80-complete-td4060543.html – zengr May 10 '13 at 20:57
  • @PascalDimassimo Is it possible to return the crawl result as josn in apache-nutch 2.2.1 – jackyesind Jul 10 '13 at 05:31
3

If you search q=tags:xyz then xyz will not be found because you had sent it not be indexed.

If you do a default search, yes, it should search the copyfield, however, according to the Solr wiki

Any number of declarations can be included in your schema, to instruct Solr that you want it to duplicate any data it sees in the "source" field of documents that are added to the index

I think that having not added 'tags' to index would also cause the copyfield of 'tags' to not be indexed.

Joyce
  • 1,431
  • 2
  • 18
  • 33
1

I haven't tried using the copyField to append additional text to an existing field. I suppose Solr could concatenate it, or add it as a second value.

But here's a couple ideas to try:

  1. Experiment with a document where the text field is blank, perhaps not even mentioned as a under the structure. Does it seem to make a difference when tags make it into the main text whether text starts out as totally blank or not?

  2. Declare a second field, call it text2. And then ALSO copy tags into text2 via a second copyField directive. This text2 field won't have anything else in it, presumably not even mentioned in your fields, so for sure it should get the content.

In both cases you'd check results with the schema browser, as before. I'd be very curious to hear how you find out!

Mark Bennett
  • 1,446
  • 2
  • 19
  • 37