57

Is there a way to tell Jackson to use UTF-8 encoding when using ObjectMapper to serialize and deserialize Objects?

Kalle Richter
  • 8,008
  • 26
  • 77
  • 177
Patricio
  • 919
  • 1
  • 6
  • 8

1 Answers1

69

Jackson automatically detects encoding used in source: as per JSON specification, only valid encodings are UTF-8, UTF-16 and UTF-32. No other encodings (like Latin-1) can be used. Because of this, auto-detection is easy and done by parser -- no encoding detection is accepted for this reason. So, if input is UTF-8, it will be detected as such.

For output, UTF-8 is the default; but if you explicitly want to use another encoding, you can create JsonGenerator explicitly (with a method that takes JsonEncoding), and pass this to ObjectMapper.

Alternatively in both cases you can of course manually construct java.io.Reader / java.io.Writer, and make it use whatever encoding you want.

StaxMan
  • 113,358
  • 34
  • 211
  • 239
  • 4
    I'm not sure how UTF-8 can be the default, when I've spent hours trying to get JSON to be encoded in UTF-8 instead of UTF-16. – cbmanica Mar 14 '14 at 22:31
  • 3
    @cbmanica Trust me, UTF-8 is the absolute default for Jackson when you give `java.io.OutputStream`. But there are other defaults: JDK has its default encoding if you choose to construct `Writer` instance yourself, or some other lib/framework does it. These are outside of Jackson. – StaxMan Mar 17 '14 at 19:56
  • @cbmanica Could you, please, share your code that helped you? Seems like I have the very same issue. – Tregoreg May 01 '15 at 03:05
  • 1
    @StaxMan How is jackson autodetcting the encoding?How can it knows, that an encoded Latin1 is not and UTF8? Because invalid charcaters? – alacambra Oct 31 '16 at 13:34
  • @alacambra Because Latin1 is not valid JSON encoding, as per JSON specification. – StaxMan Oct 31 '16 at 22:06
  • Some example code would be an improvement. I tried `JsonGenerator jsonGeneratorWithUtf16 = new JsonFactory().createGenerator(new File("C:/outputJson.json"), JsonEncoding.UTF16_BE); new ObjectMapper().writeValue(jsonGeneratorWithUtf16, objectToBeSerialized);` but when I run `file -i C:/outputJson.json` it shows `charset=binary`. – Max May 19 '17 at 19:38
  • @Max As long as result is UTF-16 Big-endian Unicode, interpretation by `file` is something Jackson can do nothing about; perhaps it only detects ASCII/UTF-7/Latin-1, and not UTF-16 encodings. Your usage looks fine. So I am not quite sure what your ask here is. – StaxMan May 20 '17 at 00:12
  • Jackson encodes XML as well, and XML on its own has no charset encoding default. It can be specified in the XML file itself (which leads to a chicken-and-egg problem; how do you ever read that without knowing the encoding?) - but the point is, 'JSON is UTF_8 so it is a silly question' is not an answer. – rzwitserloot Feb 17 '22 at 19:12
  • @rzwitserloot On reading there is no chicken-and-egg-problem: XML specification actually specifies how handling needs to be done (read it, it's well described), and the underlying parser like Woodstox implements that (XML declaration is all in ASCII). But I don't understand where XML came here -- no one asked about it as far as I can see, until you decided to bring it up. Or what your point more generally is. – StaxMan Feb 19 '22 at 04:15
  • @StaxMan The question does not mention, in any way, whether you're using the JSON 'pipeline' of the jackson project or the XML one. The answers, on the other hand, all assume JSON. I had to look in the source code to see that, yes, even on the XML side, these methods 'force' UTF-8. – rzwitserloot Feb 20 '22 at 15:54
  • 1
    @rzwitserloot The question has "json" tag; the vast majority of Jackson usage is with JSON; and everyone else clearly understood the implicit default. You are now simply trying to argue for sake of argument. If you have questions on XML usage yourself, ask a new question and don't try to insert "Actually..." style commentary where it makes no sense. – StaxMan Feb 20 '22 at 17:28
  • Calling it input and output is a little confusing. Which of it is serialization and de-serialization? – Arun Gowda Dec 28 '22 at 05:37
  • Confusing in what way? Serialization produces output, deserialization consumes input. Or is there some way to have different mapping? – StaxMan Jan 04 '23 at 01:31