Jerkson Json parser for scala.

Question

I have used Jerkson for scala, to serialize my list of objects to a JSON file. I'm able to decompose the object into JSON format object and written to a file. Now, I when I want to read it into my program for further processing I get this error. FYI, my file size is 500MB and in the future might grow upto 1GB.

I saw few forums which has asked to increase the XX:MaxPermSize=256M. I'm not sure if this is going to solve my problem, even if it does for now, what is the guarantee that this might not surface later when the size of my JSON file grows to 1GB.Is there a better alternative ? Thanks!

Exception in thread "main" java.lang.OutOfMemoryError: PermGen space
    at java.lang.String.intern(Native Method)
    at org.codehaus.jackson.util.InternCache.intern(InternCache.java:41)
    at org.codehaus.jackson.sym.CharsToNameCanonicalizer.findSymbol(CharsToNameCanonicalizer.java:506)
    at org.codehaus.jackson.impl.ReaderBasedParser._parseFieldName(ReaderBasedParser.java:997)
    at org.codehaus.jackson.impl.ReaderBasedParser.nextToken(ReaderBasedParser.java:418)
    at com.codahale.jerkson.deser.ImmutableMapDeserializer.deserialize(ImmutableMapDeserializer.scala:32)
    at com.codahale.jerkson.deser.ImmutableMapDeserializer.deserialize(ImmutableMapDeserializer.scala:11)
    at org.codehaus.jackson.map.ObjectMapper._readValue(ObjectMapper.java:2704)
    at org.codehaus.jackson.map.ObjectMapper.readValue(ObjectMapper.java:1315)
    at com.codahale.jerkson.Parser$class.parse(Parser.scala:83)
    at com.codahale.jerkson.Json$.parse(Json.scala:6)
    at com.codahale.jerkson.Parser$class.parse(Parser.scala:14)
    at com.codahale.jerkson.Json$.parse(Json.scala:6)

score 3 · Accepted Answer · answered Nov 11 '13 at 20:57

3

From the stack trace we can see that Jackson interns the Strings that are parsed as the names of fields in your document. When a String is interned, it is put in the PermGen, which is the part of the heap that you are running out of. I reckon this is because your document has many, many different field names - perhaps generating with some naming scheme? Whatever the case, increasing you MaxPermSize might help some, or at least delay the problem, but it won't solve it completely.

Disabling String interning in Jackson, on the other hand, should solve it completely. The Jackson FAQ has more information about what configuration options to tweak: http://wiki.fasterxml.com/JacksonFAQ#Problems_with_String_intern.28.29ing

answered Nov 11 '13 at 20:57

Chris Vest

8,642
3
35
43

1

I thought about that too, but I have no idea if Jerkson works the same way. Worth a shot I guess, although I still think the design could use some tweaking. – Vidya Nov 11 '13 at 21:00
1

You can see from the stack trace that Jerkson uses Jackson under the covers, so I blindly assume that it will somehow let you inject the desired configurations into Jackson. – Chris Vest Nov 11 '13 at 21:02
Thank you Chris :) As a matter of fact, I saw this option while looking for solutions but am new to using Jerkson/Jackson hence, I couldn't quite understand the terminologies being discussed. If you guys are familiar can you please give an overview of the code changes that I should be doing ? TIA. – Learner Nov 11 '13 at 21:06

score 1 · Answer 2 · answered Nov 11 '13 at 20:42

1

Adding memory will only treat the symptom rather than cure the disease. I would say this Jerkson memory issue is a blessing in disguise that exposes a fundamental design flaw.

As for how you cure the disease, I can't say for sure since I know nothing about your application or the use cases. I am pretty sure that you don't need 1 GB of information at a time. Consider streaming reads of your JSON file into a database or cache and then fetching only what you need to solve a particular problem.

Vague, I know, but I can't offer specifics without more details. The bottom line is streaming and persisting.

answered Nov 11 '13 at 20:42

Vidya

29,932
7
42
70

Thanks for the idea.We are not using a persistence storage anywhere in our application stack - mostly filei/o data as we have lots of TB -> this JSON file is a dictionary of the some of the fields in our data. Is there any other alternative ? Sorry to parry off from your idea. – Learner Nov 11 '13 at 20:51
1

Hopefully someone more clever than I can chime in, but we need some place to store the data that isn't memory. I suppose you could look into storing the information into smaller files grouped by some key and using Scala's lazy loading capabilities as much as possible. – Vidya Nov 11 '13 at 20:59

Jerkson Json parser for scala.

2 Answers2