I am trying to add documents to Solr (5.3.2) with pysolr. I generate a simple JSON object containing a large text and some metadata (date, author...) then I try to add that to Solr. My issue is that beyond a certain size, Solr will fail to index the document and return the following error :
Solr responded with an error (HTTP 400): [Reason: Exception writing document id e2699f18-ab5f-47f6-a450-60db5621879c to the index; possible analysis error.]
There really seems to be a hardcoded limit somewhere on the field length, but I can't find it.
By playing around in python I found out that :
default_obj['content'] = content[:13260]
will work fine while
default_obj['content'] = content[:13261]
will cause an error.
The content field is defined in my schema.xml as a normal type="text_general" field.
Edit: Here are the schema.xml definitions
<field name="content" type="text_general" indexed="true" stored="true" multiValued="true"/>
<fieldType name="text_general" class="solr.TextField" positionIncrementGap="100">
<analyzer type="index">
<tokenizer class="solr.StandardTokenizerFactory"/>
<filter class="solr.LowerCaseFilterFactory"/>
</analyzer>
<analyzer type="query">
<tokenizer class="solr.StandardTokenizerFactory"/>
<filter class="solr.LowerCaseFilterFactory"/>
</analyzer>
</fieldType>
I have tried adding the content manually through Solr's web admin interface, but I get the exact same problem.