how to load a JSON file to solr using `pysolr`?

Question

The following python code adds a document but without the json contents:

solr_instance = pysolr.Solr('http://192.168.45.153:8983/solr/test', timeout=60)
json_filename = '/path/to/file/test.json'
argws = {
    'commit': 'true',
    'extractOnly': False,
    'Content-Type': 'application/json',
}
with open(json_filename, 'rb') as f:
    solr_instance.extract(f, **argws)
    solr_instance.commit()

using curl from the command line works as expected:

$ curl 'http://192.168.45.153:8983/solr/test/update?commit=true' \
     --data-binary @/path/to/file/test.json \
     -H 'Content-Type: application/json'

the file has following content:

$ cat /cygdrive/w/mist/test.json
-->    [{"x": "a","y": "b"}]

I'm using pysolr 3.6.0 and solr 6.5.0

score 1 · Accepted Answer · answered Apr 20 '17 at 10:07

1

The extract() method refers to a request made against the ExtractingRequestHandler, which is meant to be used for extracting content from rich documents (such as PDFs, etc.).

You can use the regular .add method to submit the decoded JSON to Solr:

import json

solr.add(json.load(json_filename))

.. should work.

answered Apr 20 '17 at 10:07

MatsLindh

49,529
4
53
84

This works. But I get an error `Document contains at least one immense term in field="x" (whose UTF8 encoding is longer than the max length 32766), all of which were skipped.` (I showed a small document in my example) – wolfrevo Apr 20 '17 at 10:37
Open a new question for new questions, but it's probably related to having a `string` field with a length above 32766 (since a string field is indexed as a single term). – MatsLindh Apr 21 '17 at 10:52

how to load a JSON file to solr using `pysolr`?

1 Answers1