3

I wrote a mapreduce job to generate solr index for my data. I did the generation in the reducer.But the speed is really slow. Is there any way to improve the speed? The code listed below is the code inside the reducer. Is there anything wrong in my program or is there any way to improve the speed of generating indices?

private SolrClient solr;
private UpdateResponse response;
private SolrInputDocument document;

@Override
public void reduce(Text inputKey, Iterable<Text> values, Context context) throws IOException, InterruptedException {

    //process the values...
    document = new SolrInputDocument();
    document.addField("id", hid+"#"+refid);
    document.addField();
    .....
    response = solr.add(document);
    solr.commit();
}

public void setup(Context context) {
    if(solrServerMode.equals("Cloud")){
        solr = new CloudSolrClient(solrServerPath);
        ((CloudSolrClient) solr).setDefaultCollection("gettingstarted");
    }
    else if(solrServerMode.equals("Local")){
        solr = new HttpSolrClient(solrServerPath);
    }
}

@Override
public void cleanup(Context context) {
    solr.close();
}

Edit One: There is one suspicious part that may cause the speed very slow.As the picture showing, I just updated 46,205 documents but the version is very very high. enter image description here

Cheng Chen
  • 241
  • 3
  • 17

1 Answers1

4

Perform fewer or only one commit

You perform a commit after each document. That is expensive and slows the indexing process down. In case that your documents do not need to be visible for searches during the indexing process, I would suggest to rewrite as follows.

@Override
public void reduce(Text inputKey, Iterable<Text> values, Context context) throws IOException, InterruptedException {
    // .....
    response = solr.add(document);
}

@Override
public void cleanup(Context context) {
    solr.commit();
    solr.close();
}

Please consider that this will commit just at the end. As long as this you will not be able to find the documents with a search.

Tweak autoCommit settings

Another factor that comes into play would be the <autocommit> settings that you may tweak in your solrconfig.xml. These will perform a commit automatically if a certain threshold of uncommitted pending documents is reached or a certain threshold of time with uncommitted pending documents is reached. Increasing these values would additional speed up indexing.

<autoCommit>
  <maxDocs>10000</maxDocs>
  <maxTime>1000</maxTime>
  <openSearcher>false</openSearcher>
</autoCommit>
Community
  • 1
  • 1
cheffe
  • 9,345
  • 2
  • 46
  • 57