I am using solrj 1.4. My solrj doesn't index properly the documents in utf-16 encoding. I guess when it tries to convert to unicode, it replaces the problematic utf-16 surrogate keys with unicode replaceable character U+FFFD. Can anyone guide me on how to configure solrj 1.4 to index/search for utf-16 documents as well as utf-8 ?
Asked
Active
Viewed 3,562 times
1 Answers
2
The Solr index is in utf-8 (Why don't International Characters Work). In order to be able to search using other encodings you can always perform the translation in your software interfacing Solr.

Johan Sjöberg
- 47,929
- 21
- 130
- 148
-
Conversion from utf-16 to utf-8 is always 100% successful. Is there any fool proof method or api available ? – user911084 Aug 25 '11 at 15:16
-
Fool proof I don't know, but [CharsetEncoder](http://download.oracle.com/javase/6/docs/api/java/nio/charset/CharsetEncoder.html) is typically handy. – Johan Sjöberg Aug 25 '11 at 15:31