5

I am using solrj 1.4. My solrj doesn't index properly the documents in utf-16 encoding. I guess when it tries to convert to unicode, it replaces the problematic utf-16 surrogate keys with unicode replaceable character U+FFFD. Can anyone guide me on how to configure solrj 1.4 to index/search for utf-16 documents as well as utf-8 ?

secmask
  • 7,649
  • 5
  • 35
  • 52
user911084
  • 53
  • 1
  • 4

1 Answers1

2

The Solr index is in utf-8 (Why don't International Characters Work). In order to be able to search using other encodings you can always perform the translation in your software interfacing Solr.

Johan Sjöberg
  • 47,929
  • 21
  • 130
  • 148
  • Conversion from utf-16 to utf-8 is always 100% successful. Is there any fool proof method or api available ? – user911084 Aug 25 '11 at 15:16
  • Fool proof I don't know, but [CharsetEncoder](http://download.oracle.com/javase/6/docs/api/java/nio/charset/CharsetEncoder.html) is typically handy. – Johan Sjöberg Aug 25 '11 at 15:31