0

I'm encountering an issue while trying to upload documents to solr via the endpoint /update/extract.

I run solr 8.5.2 and zookeeper 3.5.8 in docker and could index data before via

...
solr.add(solr_documents)

My Setup:

The Filesystem (the django folder is not relevant for the Problem)

enter image description here

The Files in solr

enter image description here

The File in solr-config

enter image description here

I use the docker-compose.yaml (the django image isnt relevant for the problem)

version: "1.0"
services:
  solr:
    build:
      context: solr/.
      dockerfile: Dockerfile
    container_name: aips-solr
    hostname: aips-solr
    ports:
      - 8983:8983
    environment:
      - ZK_HOST=aips-zk:2181
      - SOLR_HOST=aips-solr
    networks:
      - zk-solr
      - solr-django
    restart: unless-stopped
    depends_on:
      - zookeeper
    volumes:
      - ./solr/solr-config:/opt/solr/server/solr/configsets/_default/conf

  zookeeper:
    image: zookeeper:3.5.8
    container_name: aips-zk
    hostname: aips-zk
    ports:
      - 2181:2128
    networks:
      - zk-solr
      - solr-django
    restart: unless-stopped

  django:
    build:
      context: django/.
      dockerfile: Dockerfile
    container_name: django
    hostname: django
    ports:
      - 4000:4000
    depends_on:
      - solr
    volumes:
      - ./django/app:/app
    networks:
      - solr-django

networks:
  zk-solr:
  solr-django:

The Dockerfile contains:

FROM solr:8.5.2

USER root

ADD run_solr_w_ltr.sh ./run_solr_w_ltr.sh
RUN chown solr:solr run_solr_w_ltr.sh
RUN chmod u+x run_solr_w_ltr.sh


RUN chown -R solr:solr /opt/solr/

USER solr

ENTRYPOINT "./run_solr_w_ltr.sh" 

the launch_sorl.sh contains (to copy plugin learning to rank to solr)

#!/bin/sh 
mkdir -p /var/solr/data/lib/
cp dist/solr-ltr-*.jar /var/solr/data/lib/
ls /var/solr/data/lib

solr-foreground -Dsolr.ltr.enabled=true

the launch_solr.sh starts the container with

#!/bin/sh

docker build . -t aips-solr

Solr runs sucessfully and the admin center can be accessed via http://localhost:8983/solr/#/

I followed the instruction of https://solr.apache.org/guide/8_5/uploading-data-with-solr-cell-using-apache-tika.html

I did create an file called solrconfig.xml in the sub folder solr

enter image description here

The contant is:

<lib dir="/opt/solr/contrib/extraction/lib" regex=".*\.jar" />
<lib dir="/opt/solr/dist/" regex="solr-cell-\d.*\.jar" />

<requestHandler name="/update/extract" 
                startup="lazy"
                class="solr.extraction.ExtractingRequestHandler">
   <lst name="defaults">
      <str name="lowernames">true</str>
      <str name="fmap.content">content</str>
   </lst>
</requestHandler>

I checked if the solr folder exists and contains the files.

i created a new index in the solr-admin-center

enter image description here

i should be using the config of the directory

/opt/solr/server/solr/configsets/_default/conf

right ?

I set the volumn via

volumes:
      - ./solr/solr-config:/opt/solr/server/solr/configsets/_default/conf

therefore the config should be the config of solrconfig.xml

<lib dir="/opt/solr/contrib/extraction/lib" regex=".*\.jar" />
<lib dir="/opt/solr/dist/" regex="solr-cell-\d.*\.jar" />

<requestHandler name="/update/extract" 
                startup="lazy"
                class="solr.extraction.ExtractingRequestHandler">
   <lst name="defaults">
      <str name="lowernames">true</str>
      <str name="fmap.content">content</str>
   </lst>
</requestHandler>

right?

The Settings of Parser-Specific Properties are optional if i understand it correct.

If i call the endpoint /update/extract of the collection via the admin center

enter image description here

i get

enter image description here

If i use postmann

enter image description here

with the POST command and the uri: http://localhost:8983/solr/test10/update/extract

and the key Values:

Key Value
extractOnly true
wt json
stream.file Zertifikate.pdf
stream.body xaAgikF464R9gR7Jz7ACA0... (base64 string)

I get also

enter image description here

Same if i use an adjusted curl command like in the docs

curl "http://localhost:8983/solr/gettingstarted/update/extract?literal.id=doc6&defaultField=text&commit=true" --data-binary @example/exampledocs/sample.html -H 'Content-type:text/html'

What i tried so far

i change the path of the solr folder to a relativ path

solrconfig.xml

<lib dir="../../../../../solr/contrib/extraction/lib" regex=".*\.jar" />
<lib dir="../../../../../solr/dist/" regex="solr-cell-\d.*\.jar" />

<requestHandler name="/update/extract" 
                startup="lazy"
                class="solr.extraction.ExtractingRequestHandler">
   <lst name="defaults">
      <str name="lowernames">true</str>
      <str name="fmap.content">content</str>
   </lst>
</requestHandler>

I checked if the folder solr contains the the .jars

I checked if i can access the Collection

enter image description here

i checked if the user solr has the right permissions

My setup must be wrong but I can't find any other clues on how to find and solve the error.

Any help or advice would be greatly appreciated.

Based on MatsLindh's comment, I have made the following further changes.

According to the admin interface you're running Solr in in cloud mode - that means that you have to explicitly upload your config set to the running zookeeper instance. See solr.apache.org/guide/solr/latest/deployment-guide/… - you might want to run it as a single instance instance of using the built-in cluster support if you want to just have a single node and supply the configuration on the file system instead. By MatsLindh

I uploaded the confing with the follwing steps

  1. I started docker with
docker-compose up
  1. I uploaded the config via a 2. powershell with the command

docker-compose exec solr solr zk upconfig -n newconfig -d /opt/solr/server/solr/configsets/_default/conf -z zookeeper:2181

This will upload the configuration of the folder. Afterwards the file solrconfig.xml had to be adapted as follows:

<config>
   <luceneMatchVersion>8.5.2</luceneMatchVersion>
   <lib dir="/opt/solr/contrib/extraction/lib" regex=".*\.jar" />
   <lib dir="/opt/solr/dist/" regex="solr-cell-\d.*\.jar" />

   <requestHandler name="/update/extract" 
                  startup="lazy"
                  class="solr.extraction.ExtractingRequestHandler">
      <lst name="defaults">
         <str name="lowernames">true</str>
         <str name="fmap.content">content</str>
      </lst>
   </requestHandler>
</config>

A schema.xml also needed to be created. I used the schema:

<?xml version="1.0" encoding="UTF-8" ?>
<schema>
    <fieldType name="text_general" class="solr.TextField" 
     positionIncrementGap="100"> 
        <analyzer type="index"> 
            <tokenizer class="solr.StandardTokenizerFactory"/>
            <filter class="solr.StopFilterFactory" ignoreCase="true" 
            words="stopwords.txt" />
           <filter class="solr.LowerCaseFilterFactory"/>
        </analyzer>
        <analyzer type="query">
            <tokenizer class="solr.StandardTokenizerFactory"/>
            <filter class="solr.StopFilterFactory" ignoreCase="true" 
            words="stopwords.txt" />
            <filter class="solr.SynonymFilterFactory" 
            synonyms="synonyms.txt" ignoreCase="true" expand="true"/>
            <filter class="solr.LowerCaseFilterFactory"/>
        </analyzer>
    </fieldType>

    <fields>
        <field name="title" type="text_general" indexed="true" 
        stored="true"/>
        <field name="content" type="text_general" indexed="true" 
        stored="true"/>
    </fields>
</schema>

Because of the schema the two text files synonyms.txt and stopwords.txt had to be created. After the changes my Folderstructure looks like enter image description here After all the changes i get the following error if i try to create a new collection with the configset:enter image description here

Possibly unhandled rejection: {"data":{"responseHeader":{"status":400,"QTime":620},"failure":{"aips-solr:8983_solr":"org.apache.solr.client.solrj.impl.HttpSolrClient$RemoteSolrException:Error from server at http://aips-solr:8983/solr: Error CREATEing SolrCore 'test_upload_3_shard1_replica_n1': Unable to create core [test_upload_3_shard1_replica_n1] Caused by: null"},"Operation create caused exception:":"org.apache.solr.common.SolrException:org.apache.solr.common.SolrException: Underlying core creation failed while creating collection: test_upload_3","exception":{"msg":"Underlying core creation failed while creating collection: test_upload_3","rspCode":400},"error":{"metadata":["error-class","org.apache.solr.common.SolrException","root-error-class","org.apache.solr.common.SolrException"],"msg":"Underlying core creation failed while creating collection: test_upload_3","code":400}},"status":400,"config":{"method":"GET","transformRequest":[null],"transformResponse":[null],"jsonpCallbackParam":"callback","url":"admin/collections","params":{"wt":"json","_":1687760309417,"action":"CREATE","name":"test_upload_3","router.name":"compositeId","numShards":1,"collection.configName":"newconfig","replicationFactor":1,"maxShardsPerNode":1,"autoAddReplicas":"false"},"headers":{"Accept":"application/json, text/plain, /","X-Requested-With":"XMLHttpRequest"},"timeout":10000},"statusText":"Bad Request","xhrStatus":"complete","resource":{}}

I think it has to do with a network or firewall issue. The guess is based on this stackoverflow post Failed to create collection

I will check it this evening on another pc.

  • 2
    According to the admin interface you're running Solr in in cloud mode - that means that you have to explicitly upload your config set to the running zookeeper instance. See https://solr.apache.org/guide/solr/latest/deployment-guide/solr-in-docker.html#creating-collections - you might want to run it as a single instance instance of using the built-in cluster support if you want to just have a single node and supply the configuration on the file system instead. – MatsLindh Jun 25 '23 at 21:16
  • Thank you very much for the advice. :) The config did indeed have to be uploaded. Where did you find the info ? Does the info come from a deeper understanding of the article https://solr.apache.org/guide/solr/latest/deployment-guide/docker-networking.html ? – Lukas Trenz Jun 26 '23 at 06:58
  • It's mentioned in passing in the last sentence in the section I linked in my comment, but also because of knowledge that when you're using Solr in cloud mode, configuration is not read directly from disk (since it needs to be shared among nodes). – MatsLindh Jun 26 '23 at 08:19
  • The actual server log should have more details about the underlying reason why the core creation failed - since you're referencing libraries with a local path I might guess that it doesn't want to / can't find the libraries. You might want to follow https://solr.apache.org/guide/solr/latest/configuration-guide/libs.html#lib-directories to try one of the other methods for adding libraries in that case. – MatsLindh Jun 26 '23 at 09:19

0 Answers0