0

The following python code adds a document but without the json contents:

solr_instance = pysolr.Solr('http://192.168.45.153:8983/solr/test', timeout=60)
json_filename = '/path/to/file/test.json'
argws = {
    'commit': 'true',
    'extractOnly': False,
    'Content-Type': 'application/json',
}
with open(json_filename, 'rb') as f:
    solr_instance.extract(f, **argws)
    solr_instance.commit()

using curl from the command line works as expected:

$ curl 'http://192.168.45.153:8983/solr/test/update?commit=true' \
     --data-binary @/path/to/file/test.json \
     -H 'Content-Type: application/json'

the file has following content:

$ cat /cygdrive/w/mist/test.json
-->    [{"x": "a","y": "b"}]

I'm using pysolr 3.6.0 and solr 6.5.0

wolfrevo
  • 6,651
  • 2
  • 26
  • 38

1 Answers1

1

The extract() method refers to a request made against the ExtractingRequestHandler, which is meant to be used for extracting content from rich documents (such as PDFs, etc.).

You can use the regular .add method to submit the decoded JSON to Solr:

import json

solr.add(json.load(json_filename))

.. should work.

MatsLindh
  • 49,529
  • 4
  • 53
  • 84
  • This works. But I get an error `Document contains at least one immense term in field="x" (whose UTF8 encoding is longer than the max length 32766), all of which were skipped.` (I showed a small document in my example) – wolfrevo Apr 20 '17 at 10:37
  • Open a new question for new questions, but it's probably related to having a `string` field with a length above 32766 (since a string field is indexed as a single term). – MatsLindh Apr 21 '17 at 10:52