Solr not recognizing camelcased field names over update/extract?

Question

I've been working with SolrJ for months now without any problem with a schema that follows the following pattern, with underscores and camelcasing:

<field name="museum_eventActor" type="text" indexed="true" stored="true" multiValued="true"/>
<field name="museum_eventType" type="text" indexed="false" stored="true" multiValued="true"/>
<field name="museum_eventPlace" type="text" indexed="true" stored="true" multiValued="true"/>
<field name="museum_eventDate" type="text" indexed="true" stored="true" multiValued="true"/>

We recently decided that we wanted to index some PDF content, so I started using curl to test some content:

curl "http://localhost:8090/solr-museum/archival/update/extract?stream.file=/home/user/Downloads/transcript.pdf&stream.contentType=application/pdf&literal.id=C1-1-5&literal.museum_eventActor=test&fmap.content=text&commit=true"

But I was noticing that although Solr was acknowledging my fields, none of them were showing up in my index. Solr log says:

792560 [http-8090-1] INFO  org.apache.solr.update.processor.LogUpdateProcessor  – [archivalRecord] webapp=/solr-museum path=/update/extract params={fmap.content=text&commit=true&literal.museum_eventActor=&literal.id=C1-1-5&stream.contentType=application/pdf&stream.file=/home/user/Downloads/transcript.pdf} {add=[C1-1-5 (1467000805262360576)],commit=} 0 698

and the index looks like:

<doc>
    <str name="id">C1-1-5</str>
    <long name="_version_">1467000805262360576</long>
    <arr name="content">
        <str>1467000805262360576</str>
    </arr>
</doc>

After a day of playing around and searching online, I found this SO question which made me wonder about camelcasing: Solr - Missing Required Field

So I modified my schema to look something like this:

<field name="museum_eventactor" type="text" indexed="true" stored="true" multiValued="true"/>
<field name="museumeventtype" type="text" indexed="false" stored="true" multiValued="true"/>
<field name="museumeventPlace" type="text" indexed="true" stored="true" multiValued="true"/>
<field name="museum_eventDate" type="text" indexed="true" stored="true" multiValued="true"/>

And fired over this request:

curl "http://localhost:8090/solr-museum/archival/update/extract?stream.file=/home/user/Downloads/transcript.pdf&stream.contentType=application/pdf&literal.id=C1-1-5&literal.museum_eventactor=test&literal.museumeventtype=test&literal.museumeventPlace=test&literal.museum_eventDate=test&fmap.content=text&commit=true"

And sure enough, the fields with camelcasing aren't being recognized:

<doc>
    <arr name="museum_eventactor">
        <str>test</str>
    </arr>
    <str name="id">C1-1-5</str>
    <arr name="museumeventtype">
        <str>test</str>
    </arr>
    <long name="_version_">1467001178833289216</long>
</doc>

Now I've searched through a lot of the Solr documentation, and although they point out repeatedly that there are very few restrictions on field names if you're willing to accept the consequences, never have I ever encountered a scenario where camelcasing isn't a valid naming scheme especially in Java. I'm kind of stumped as to why this might be happening. Does anyone have any ideas that might explain this behavior?

Solr not recognizing camelcased field names over update/extract?

0 Answers0