I've been working with SolrJ for months now without any problem with a schema that follows the following pattern, with underscores and camelcasing:
<field name="museum_eventActor" type="text" indexed="true" stored="true" multiValued="true"/>
<field name="museum_eventType" type="text" indexed="false" stored="true" multiValued="true"/>
<field name="museum_eventPlace" type="text" indexed="true" stored="true" multiValued="true"/>
<field name="museum_eventDate" type="text" indexed="true" stored="true" multiValued="true"/>
We recently decided that we wanted to index some PDF content, so I started using curl to test some content:
curl "http://localhost:8090/solr-museum/archival/update/extract?stream.file=/home/user/Downloads/transcript.pdf&stream.contentType=application/pdf&literal.id=C1-1-5&literal.museum_eventActor=test&fmap.content=text&commit=true"
But I was noticing that although Solr was acknowledging my fields, none of them were showing up in my index. Solr log says:
792560 [http-8090-1] INFO org.apache.solr.update.processor.LogUpdateProcessor – [archivalRecord] webapp=/solr-museum path=/update/extract params={fmap.content=text&commit=true&literal.museum_eventActor=&literal.id=C1-1-5&stream.contentType=application/pdf&stream.file=/home/user/Downloads/transcript.pdf} {add=[C1-1-5 (1467000805262360576)],commit=} 0 698
and the index looks like:
<doc>
<str name="id">C1-1-5</str>
<long name="_version_">1467000805262360576</long>
<arr name="content">
<str>1467000805262360576</str>
</arr>
</doc>
After a day of playing around and searching online, I found this SO question which made me wonder about camelcasing: Solr - Missing Required Field
So I modified my schema to look something like this:
<field name="museum_eventactor" type="text" indexed="true" stored="true" multiValued="true"/>
<field name="museumeventtype" type="text" indexed="false" stored="true" multiValued="true"/>
<field name="museumeventPlace" type="text" indexed="true" stored="true" multiValued="true"/>
<field name="museum_eventDate" type="text" indexed="true" stored="true" multiValued="true"/>
And fired over this request:
curl "http://localhost:8090/solr-museum/archival/update/extract?stream.file=/home/user/Downloads/transcript.pdf&stream.contentType=application/pdf&literal.id=C1-1-5&literal.museum_eventactor=test&literal.museumeventtype=test&literal.museumeventPlace=test&literal.museum_eventDate=test&fmap.content=text&commit=true"
And sure enough, the fields with camelcasing aren't being recognized:
<doc>
<arr name="museum_eventactor">
<str>test</str>
</arr>
<str name="id">C1-1-5</str>
<arr name="museumeventtype">
<str>test</str>
</arr>
<long name="_version_">1467001178833289216</long>
</doc>
Now I've searched through a lot of the Solr documentation, and although they point out repeatedly that there are very few restrictions on field names if you're willing to accept the consequences, never have I ever encountered a scenario where camelcasing isn't a valid naming scheme especially in Java. I'm kind of stumped as to why this might be happening. Does anyone have any ideas that might explain this behavior?