3

I've been working with SolrJ for months now without any problem with a schema that follows the following pattern, with underscores and camelcasing:

<field name="museum_eventActor" type="text" indexed="true" stored="true" multiValued="true"/>
<field name="museum_eventType" type="text" indexed="false" stored="true" multiValued="true"/>
<field name="museum_eventPlace" type="text" indexed="true" stored="true" multiValued="true"/>
<field name="museum_eventDate" type="text" indexed="true" stored="true" multiValued="true"/>

We recently decided that we wanted to index some PDF content, so I started using curl to test some content:

curl "http://localhost:8090/solr-museum/archival/update/extract?stream.file=/home/user/Downloads/transcript.pdf&stream.contentType=application/pdf&literal.id=C1-1-5&literal.museum_eventActor=test&fmap.content=text&commit=true"

But I was noticing that although Solr was acknowledging my fields, none of them were showing up in my index. Solr log says:

792560 [http-8090-1] INFO  org.apache.solr.update.processor.LogUpdateProcessor  – [archivalRecord] webapp=/solr-museum path=/update/extract params={fmap.content=text&commit=true&literal.museum_eventActor=&literal.id=C1-1-5&stream.contentType=application/pdf&stream.file=/home/user/Downloads/transcript.pdf} {add=[C1-1-5 (1467000805262360576)],commit=} 0 698

and the index looks like:

<doc>
    <str name="id">C1-1-5</str>
    <long name="_version_">1467000805262360576</long>
    <arr name="content">
        <str>1467000805262360576</str>
    </arr>
</doc>

After a day of playing around and searching online, I found this SO question which made me wonder about camelcasing: Solr - Missing Required Field

So I modified my schema to look something like this:

<field name="museum_eventactor" type="text" indexed="true" stored="true" multiValued="true"/>
<field name="museumeventtype" type="text" indexed="false" stored="true" multiValued="true"/>
<field name="museumeventPlace" type="text" indexed="true" stored="true" multiValued="true"/>
<field name="museum_eventDate" type="text" indexed="true" stored="true" multiValued="true"/>

And fired over this request:

curl "http://localhost:8090/solr-museum/archival/update/extract?stream.file=/home/user/Downloads/transcript.pdf&stream.contentType=application/pdf&literal.id=C1-1-5&literal.museum_eventactor=test&literal.museumeventtype=test&literal.museumeventPlace=test&literal.museum_eventDate=test&fmap.content=text&commit=true"

And sure enough, the fields with camelcasing aren't being recognized:

<doc>
    <arr name="museum_eventactor">
        <str>test</str>
    </arr>
    <str name="id">C1-1-5</str>
    <arr name="museumeventtype">
        <str>test</str>
    </arr>
    <long name="_version_">1467001178833289216</long>
</doc>

Now I've searched through a lot of the Solr documentation, and although they point out repeatedly that there are very few restrictions on field names if you're willing to accept the consequences, never have I ever encountered a scenario where camelcasing isn't a valid naming scheme especially in Java. I'm kind of stumped as to why this might be happening. Does anyone have any ideas that might explain this behavior?

Community
  • 1
  • 1
rsmuckles
  • 31
  • 3

0 Answers0