Solr Indexing using CSV Update UI: unique key is created as a multivalued fields errors out

Question

I am trying to load the CSV file in the solr 6.5 collection, using the solr Admin UI. Here are the steps that I did and got the following error:

Created a data driven managed schema config set in Zookeeper. Changed the unique key to "MyId" (String field) instead of default id.

<uniqueKey>MyId</uniqueKey>
        ...
<field name="MyId" type="string" indexed="true" stored="true" required="true" multiValued="false" />

Created collection and associated the config set mentioned above (using new Admin UI).
Load the CSV file using Admin UI (collections --> collection name drop down --> Documents). I have added request handler parameter of &rowid=MyId parameters. My CSV file has MyId field in it. During the load I get this error:

Document contains multiple values for uniqueKey field: MyId=[82552329, 1] at org.apache.solr.update.AddUpdateCommand.getHashableId(AddUpdateCommand.java:168)
Without changing the unique ID and just using the default id (with auto generated UUID) field the csv loading fine. But I need the unique id to be MyId

I would like to know why my key field is reported as multi-valued, my CSV does not really contain multi-valued data, it is simple comma separated numeric and string fields. Please suggest what could have gone wrong.

Note: I have made this change as well Solr Schemaless Mode creating fields as MultiValued in the schema (does not help, as the problem is input data)

EDIT: Adding full exception trace

https://pastebin.com/raw/juRj7ZUi

What do logs say? Please show me the full log of log4j.Something tells solr to split your string into array. — Oyeme, Apr 13 '17 at 08:43
added a exception trace. You are right something within probably update CSV handler to split it. But I am wondering what it could be? Though, I have not added any CSV split params in my request handler. — Ganesh, Apr 13 '17 at 15:05
you could specify more params here https://cwiki.apache.org/confluence/display/solr/Uploading+Data+with+Index+Handlers and also try to use console instead of UI, the you could pass more params — Oyeme, Apr 13 '17 at 15:32
I have added [pastebin solr log with debug enabled](https://pastebin.com/raw/juRj7ZUi) as well url to the question above. Also tried split=false through the UI and i see those as well in log, but does not help it still creates a multivalued data from the csv — Ganesh, Apr 13 '17 at 18:35
Another interesting thing is if i have the unique id to the default `id` field, and let the system generate the UUID using the `solr.UUIDUpdateProcessorFactory` it is working fine. Looks like it is a bug? if we have our own unique id. — Ganesh, Apr 14 '17 at 15:27

score 1 · Accepted Answer · answered Apr 14 '17 at 20:04

I got a clue in the documentation csv update params that the issues is something to do with this param that i pass ( &rowid=MyId). As the documentation states that we should pass this paramater to add the line number as the id. That explains why my key (MyId) becomes a multi valued ([my actual key, line no.]). But then if i remove this param it was giving an error that id is not being populate. This means that it was expecting an id field. So added &literal.id=1, now everything works fine ( This is because in my schema there is required id field.). Thanks for helping out.

Solr Indexing using CSV Update UI: unique key is created as a multivalued fields errors out

1 Answers1