0

I am using atomic update to update meta-data in a SOLR document collection. To do so, I use an external .json file where I record all document IDs in the collection and possible meta-data, and use the "set" command to commit requested updates. But I figured out that whenever the external file is larger than approx 8200 bytes / 220 lines, I get this error message :

"org.apache.solr.common.SolrException: Cannot parse provided JSON: Unexpected EOF: char=(EOF),position=8191 BEFORE=''"

This does'nt seem to be related to the actual content of the file (or possible missing parenthesis or other) as I reproduced it with different databases. Moreover, If I cut the external file into smaller, less than 8000 bytes, updates work perfectly. Has anyone an idea of where this could come from ?

The curl command to update the collection is as follow :

curl 'http://localhost:8983/solr/these/update/json?commit=true' -d @test5.json

The SOLR main configuration file is available after the post. I can provide the json update file if needed. I'm available for any further elements.

Thanks in advance for your help,

Barthélémy

    <?xml version="1.0" encoding="UTF-8" ?>
<!--
 Licensed to the Apache Software Foundation (ASF) under one or more
 contributor license agreements.  See the NOTICE file distributed with
 this work for additional information regarding copyright ownership.
 The ASF licenses this file to You under the Apache License, Version 2.0
 (the "License"); you may not use this file except in compliance with
 the License.  You may obtain a copy of the License at

     http://www.apache.org/licenses/LICENSE-2.0

 Unless required by applicable law or agreed to in writing, software
 distributed under the License is distributed on an "AS IS" BASIS,
 WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
 See the License for the specific language governing permissions and
 limitations under the License.
-->

<!-- 
 This is a DEMO configuration highlighting elements
 specifically needed to get this example running
 such as libraries and request handler specifics.

 It uses defaults or does not define most of production-level settings
 such as various caches or auto-commit policies.

 See Solr Reference Guide and other examples for
 more details on a well configured solrconfig.xml
 https://cwiki.apache.org/confluence/display/solr/The+Well-Configured+Solr+Instance
-->

<config>
  <!-- Controls what version of Lucene various components of Solr
   adhere to.  Generally, you want to use the latest version to
   get all bug fixes and improvements. It is highly recommended
   that you fully re-index after changing this setting as it can
   affect both how text is indexed and queried.
  -->
  <luceneMatchVersion>6.6.0</luceneMatchVersion>

  <!-- Load Data Import Handler and Apache Tika (extraction) libraries -->
  <lib dir="${solr.install.dir:../../../..}/dist/" regex="solr-dataimporthandler-.*\.jar"/>
  <lib dir="${solr.install.dir:../../../..}/contrib/extraction/lib" regex=".*\.jar"/>
  <lib dir="${solr.install.dir:../../../..}/contrib/langid/lib" regex=".*\.jar"/>
  <lib dir="${solr.install.dir:../../../..}/dist/" regex="solr-langid-.*\.jar"/>

  <requestHandler name="/select" class="solr.SearchHandler">
    <lst name="defaults">
      <str name="echoParams">explicit</str>
      <str name="df">text</str>
    </lst>
  </requestHandler>

  <requestHandler name="/dataimport" class="solr.DataImportHandler">
    <lst name="defaults">
      <str name="config">tika-data-config.xml</str>
    </lst>
  </requestHandler>


  <updateRequestProcessorChain name="langid" default="true" onError = "skip">
     <processor  class="org.apache.solr.update.processor.LangDetectLanguageIdentifierUpdateProcessorFactory"
       onError = "continue">
       <str name="langid.fl">text</str>
       <str name="langid.langField">language_s</str>
       <str name="langid.threshold">0.8</str>
       <str name="langid.fallback">en</str>
     </processor>
     <processor class="solr.LogUpdateProcessorFactory" onError = "skip"/>
     <processor class="solr.RunUpdateProcessorFactory" onError = "skip"/>
   </updateRequestProcessorChain>

<!-- The default high-performance update handler -->
  <updateHandler class="solr.DirectUpdateHandler2">

    <!-- Enables a transaction log, used for real-time get, durability, and
         and solr cloud replica recovery.  The log can grow as big as
         uncommitted changes to the index, so use of a hard autoCommit
         is recommended (see below).
         "dir" - the target directory for transaction logs, defaults to the
                solr data directory.   -->
    <updateLog>
      <str name="dir">${solr.ulog.dir:}</str>
    </updateLog>

  </updateHandler>

</config>
Barth
  • 1
  • 2

2 Answers2

1

I don't know if this will solve it for anybody else that runs into this but I ran into this same issue.

My inital command looked like this:

curl http://localhost:8983/solr/your_solr_core/update?commit=true --data-binary @test5.json -H "Content-type:application/json"

Updating it to this solved the problem

curl http://localhost:8983/solr/your_solr_core/update?commit=true -H "Content-Type: application/json" -T "test5.json" -X POST

apparently it has something to do with curl loading the whole file into memory with the first command which causes issues, whereas the second command uses minimal memory.

Dharman
  • 30,962
  • 25
  • 85
  • 135
Da5id
  • 11
  • 1
  • 2
0

try editing server/etc/jetty.xml and tweak requestHeaderSize:

    <Set name="requestHeaderSize"><Property 
name="solr.jetty.request.header.size" default="8192" /></Set>

to something larger than your file limit.

Persimmonium
  • 15,593
  • 11
  • 47
  • 78
  • Not better, but I think we are close to the problem. I found a post about curl upload limitations for security reasons. https://stackoverflow.com/questions/31941213/is-there-any-size-limit-to-post-a-file-using-curl. Their might be other parameters to configure the same way ? – Barth Aug 24 '17 at 09:58
  • Another reference on this issue https://maxchadwick.xyz/blog/http-request-header-size-limits – Barth Aug 24 '17 at 10:16
  • sure, if you are hitting some limits before the request arrives to Solr (via jetty), you have to fix that first. It might be a bit tricky, on solr side the param has changed name at some point etc. – Persimmonium Aug 24 '17 at 11:49
  • Well, Apache is known to have a 8Mb request upload limitation, but I don't know how to increase it. Is jetty the http server answering the curl command or is there other intermediary services that could present such a limitation? – Barth Aug 24 '17 at 14:08