0

When trying to store romanian special characters (diacritics) into a solr schema field, like:

<field name="description" type="text_general" indexed="true" stored="true" required="false"/>

The romanian characters are: (ă,î,â,ș,ț) and they are replaced in SOLR by ?.

To mention I've done everything a basic setup requires, I run it with Tomcat6.

My Solr version is 4.7.1

Gabriel
  • 772
  • 1
  • 13
  • 37

1 Answers1

0

Make sure you submit data to Solr in proper encoding.

Also consider specifies charset for content type. E.g. Content-Type:text/plain; charset=UTF-8

Also try to check how data is parsed at Solr Side. Just debug this method:

org.apache.solr.servlet.SolrRequestParsers.parseParamsAndFillStreams(HttpServletRequest, ArrayList<ContentStream>)

See thise lines:

final String cs = ContentStreamBase.getCharsetFromContentType(req.getContentType());
final Charset charset = (cs == null) ? IOUtils.CHARSET_UTF_8 : Charset.forName(cs);

Solr should come up with UTF-8 here.

dpetruha
  • 1,214
  • 12
  • 12