3

I have a Solr index with about 2.5M items in it and I am trying to use an ExternalFileField to boost relevancy. Unfortunately, it's VERY slow when I try to do this, despite it being a beefy machine and Solr having lots of memory available.

In the external file I have contents like:

747501=3.8294805903e-07
747500=3.8294805903e-07
1718770=4.03292174724e-07
1534562=3.8294805903e-07
1956010=3.8294805903e-07
747509=3.8294805903e-07
747508=3.8294805903e-07
1718772=3.8294805903e-07
1391385=3.8294805903e-07
2089652=3.8294805903e-07
1948271=3.8294805903e-07
108368=3.84404072186e-06

Each line is a document ID and it's corresponding boosting factor.

In my query I'm using edismax, and I am using the boost parameter, setting it to pagerank. The entire query is here.

In my schema I have:

<!-- External File Field Type-->
<fieldType name="pagerank"
           keyField="id"
           stored="false"
           indexed="true"
           omitNorms="false"
           class="solr.ExternalFileField"
           valType="float"/>

and

   <field name="pagerank"
          type="pagerank"
          indexed="true"
          stored="true"
          omitNorms="false"/>

But the performance is just, plain bad. Am I missing a setting or something?

cheffe
  • 9,345
  • 2
  • 46
  • 57
mlissner
  • 17,359
  • 18
  • 106
  • 169
  • 4
    According to the [javadoc](http://lucene.apache.org/solr/4_5_1/solr-core/org/apache/solr/schema/ExternalFileField.html) - `The external file may be sorted or unsorted by the key field, but it will be substantially slower (untested) if it isn't sorted.` And as I see, ids in your file are unsorted. Can you sort it and test if it helps? – rchukh Nov 06 '13 at 00:10
  • Yep, that seemed to do it, thanks. – mlissner Nov 27 '13 at 20:00
  • How about accepting this then? – Peter Aron Zentai Aug 25 '16 at 00:48
  • Yeah, sure. Somebody put it in an answer two years after it was in a comment. – mlissner Aug 25 '16 at 00:50

1 Answers1

1

According to the javadoc

The external file may be sorted or unsorted by the key field, but it will be substantially slower (untested) if it isn't sorted.

And as I see, ids in your file are unsorted. Can you sort it and test if it helps?

cheffe
  • 9,345
  • 2
  • 46
  • 57