3

My creation/deletion of Solr collections doesn't work anymore. When I launch a creation (via a curl), I have the following answer (after 30sec):

Error 500 - Could not fully create collection: <collection_name>

EDIT: I had another time, the same issue: Solr couldn't fully reboot, or was hanging.

HDP: 2.6.2
Solr(Cloud): 5.5.5
ZK: 3.4.6

tdebroc
  • 1,436
  • 13
  • 28
  • Is this SolrCloud? I ask because you are using ZooKeeper and SolrCloud is a very different beast from Solr. – kellyfj Nov 08 '18 at 22:14
  • 1
    Hey kellyfj, yes indeed its solrCloud, I have updated the description. Actually, I had resolved this (and added the answer too) – tdebroc Nov 10 '18 at 10:02

1 Answers1

5

I have struggled so many days with that problem !

In fact, the overseer queue was too large in Zookeeper:

zkCli.sh -server zkhost:2181 ls /solr/overseer/queue and zkCli.sh -server zkhost:2181 ls /solr/overseer/queue-work returned several 100k entries and kept growing !

Process to recover:
1. Stop Solr Nodes
2. Remove overseer queues and recreate them:
zkCli.sh -server zkhost:2181 rmr /solr/overseer/queue
zkCli.sh -server zkhost:2181 create /solr/overseer/queue
zkCli.sh -server zkhost:2181 rmr /solr/overseer/queue-work null
zkCli.sh -server zkhost:2181 create /solr/overseer/queue-work null
3. Start solr Nodes

We can see in the code: https://github.com/apache/lucene-solr/blob/dbed8bafe6ee167361599deaa4f1b5fdbb0b1c32/solr/core/src/java/org/apache/solr/cloud/api/collections/CreateCollectionCmd.java#L170 The Code try to create the nodes for the Solr collection, then during 30sec poll Zookeeper to check if it has created the nodes. If not it fails with "Could not fully create collection:"

tdebroc
  • 1,436
  • 13
  • 28
  • 1
    We've got a similar issue when asking Solr to reload a collection. `null:org.apache.solr.common.SolrException: reload the collection time out:180s at org.apache.solr.handler.admin.CollectionsHandler.sendToOCPQueue` In our case the `/overseer/collection-queue-work/` was containing a large number of items. Clearing it seems to solve the issue. – Gaël J Mar 30 '21 at 07:08