0

I am trying to load the CSV file in the solr 6.5 collection, using the solr Admin UI. Here are the steps that I did and got the following error:

  1. Created a data driven managed schema config set in Zookeeper. Changed the unique key to "MyId" (String field) instead of default id.

<uniqueKey>MyId</uniqueKey>
        ...
<field name="MyId" type="string" indexed="true" stored="true" required="true" multiValued="false" />
  1. Created collection and associated the config set mentioned above (using new Admin UI).

  2. Load the CSV file using Admin UI (collections --> collection name drop down --> Documents). I have added request handler parameter of &rowid=MyId parameters. My CSV file has MyId field in it. During the load I get this error:

    Document contains multiple values for uniqueKey field: MyId=[82552329, 1] at org.apache.solr.update.AddUpdateCommand.getHashableId(AddUpdateCommand.java:168)

  3. Without changing the unique ID and just using the default id (with auto generated UUID) field the csv loading fine. But I need the unique id to be MyId

I would like to know why my key field is reported as multi-valued, my CSV does not really contain multi-valued data, it is simple comma separated numeric and string fields. Please suggest what could have gone wrong.

Note: I have made this change as well Solr Schemaless Mode creating fields as MultiValued in the schema (does not help, as the problem is input data)

EDIT: Adding full exception trace

https://pastebin.com/raw/juRj7ZUi

vmaldosan
  • 444
  • 4
  • 14
Ganesh
  • 573
  • 2
  • 13
  • What do logs say? Please show me the full log of log4j.Something tells solr to split your string into array. – Oyeme Apr 13 '17 at 08:43
  • added a exception trace. You are right something within probably update CSV handler to split it. But I am wondering what it could be? Though, I have not added any CSV split params in my request handler. – Ganesh Apr 13 '17 at 15:05
  • 1
    you could specify more params here https://cwiki.apache.org/confluence/display/solr/Uploading+Data+with+Index+Handlers and also try to use console instead of UI, the you could pass more params – Oyeme Apr 13 '17 at 15:32
  • I have added [pastebin solr log with debug enabled](https://pastebin.com/raw/juRj7ZUi) as well url to the question above. Also tried split=false through the UI and i see those as well in log, but does not help it still creates a multivalued data from the csv – Ganesh Apr 13 '17 at 18:35
  • Another interesting thing is if i have the unique id to the default `id` field, and let the system generate the UUID using the `solr.UUIDUpdateProcessorFactory` it is working fine. Looks like it is a bug? if we have our own unique id. – Ganesh Apr 14 '17 at 15:27

1 Answers1

1

I got a clue in the documentation csv update params that the issues is something to do with this param that i pass ( &rowid=MyId). As the documentation states that we should pass this paramater to add the line number as the id. That explains why my key (MyId) becomes a multi valued ([my actual key, line no.]). But then if i remove this param it was giving an error that id is not being populate. This means that it was expecting an id field. So added &literal.id=1, now everything works fine ( This is because in my schema there is required id field.). Thanks for helping out.

Ganesh
  • 573
  • 2
  • 13