8

We have a java/jruby webapp running under tomcat, and I have been analyzing the number of objects and memory use by the app during runtime. I have noticed after startup the class "org.jruby.RubyString" had 1,118,000 instances of the string "", the total amount of heap memory used by empty strings alone is 65mb, this to me is ridiculous because it is 15% of the memory used by the webapp. The empty string is only one example of many string values with this problem, if I can intern all the jruby strings I worked out I could save about 130mb.

I know in Java, each time when a string value is created, it will check if the value already exists in the string pool and reuse it if it does. I am wondering if there is an option in Jruby that has the same optimization? if so, how do I enable it?

Example in Jruby:

v1 = "a"
v2 = "a"
puts v1.object_id # => 3352
puts v2.object_id # => 3354

Example in Java:

String v1 = "a";
String v2 = "a";

System.out.println(v1.hashCode()); # => 97
System.out.println(v2.hashCode()); # => 97
Chiwai Chan
  • 4,716
  • 4
  • 30
  • 33
  • If you replace all the Strings with Symbols you'll get that behavior, but I don't know of an option to make it intern Strings automatically. – Mark Reed May 25 '12 at 04:56
  • Not an ideal solution because alot of these strings are created from 3rd party gems and plugins. – Chiwai Chan May 25 '12 at 05:12
  • can you publish one of the pieces of code that produces these empty strings ? – peter May 25 '12 at 10:06
  • Java hash codes are not proof of two objects being the same (not that you are wrong about string literals being automatically interned in Java). `String a = "a" + "b"; String b = "a" + "b";` will, AFAIK, create two objects, not one, but both will have the same hash code. – Theo Jun 02 '12 at 21:12

4 Answers4

5

I understand the motivation behind this, but there's really no such "magic" switch in JRuby ...

From a Java background it feels temping to save on strings, but you can't expect strings to behave the same way in JRuby as they do in Java. First of all they're a completely different object. I would go as far as to say that a Ruby String is more of a Java StringBuilder.

It's certainly a waste to have so many "" instances lying around, but if that code as you mention is third-party code there's not much you can do about it - unless you feel like monkey patching a lot. I would try to identify the places most of the instances come from and refactor those - but remember there are some "tricky" parts on saving strings e.g. with Hash:

{ 'foo' => 'bar' }

You would guess this creates 3 objects, but you'd be wrong; it actually creates two of the 'foo'. Since a String is mutable (unless frozen?) it dups the string and freezes when used as a Hash key (and there's a good reason for that).

Also keep in mind to refactor "intelligently" - profile the bits you're changing if you do not slow things down by trying to get cheap on instances allocated.

kares
  • 7,076
  • 1
  • 28
  • 38
  • Informative answer, but basically boils down to "jruby (and likely ruby) strings are inefficient, so live with it." While this may be the final word, it's not comforting. – Glenn May 31 '12 at 22:09
  • @Glenn sorry but since there's 3rd party code involved there's really no better answer I guess. some gems already acknowledge this and save on strings by freezing and storing them in constants e.g. https://github.com/puma/puma/blob/master/lib/puma/const.rb, while others, unfortunately, assume they can modify `String` method parameters they receive. at some point jruby (but even MRI) might do a heuristic of trying to reuse some `String` "constants" but I doubt there's a lot they can do, it's mostly left on us the programmers ... – kares Jun 01 '12 at 06:57
  • The problem really boils down to Ruby _having_ to create new objects for literal strings (and hashes, and arrays), since these objects are mutable and there’s no way for the runtime to know that they won’t be mutated. If you turned on some magic switch and made strings immutable the libraries you’re using would likely break. – Theo Jun 02 '12 at 21:15
2
v1 = v2 = v3 = "a"

Will only create one object in Ruby, not three.

v1 = v2 = v3 = "a" # => "a"
v1.object_id # => 10530560
v2.object_id # => 10530560
v1 << "ll the same" # => "all the same"
v2 # "all the same"

Before doing something as drastic as interning all the strings, I'd check with other tomcat users if this is the best way of dealing with this problem. I don't use Tomcat, or JRuby, but I strongly suspect this isn't the best approach.

Edit If every object that was built from an "a" was the same object, then modifying one of them would modify all of the other strings. That would be a side effect nightmare.

Andrew Grimm
  • 78,473
  • 57
  • 200
  • 338
  • My example in my comment earlier was a lazy example. Try, a = "a", b = "a" and the 2 objects will have different object_id. – Chiwai Chan May 27 '12 at 23:32
1

The only way to intern a String in JRuby is to call to_sym or intern (they alias each other), and thus making them symbols — which, as you mentioned, doesn’t quite help for third-party gems. There isn’t, as far as I’m aware, any other way.

This is in line with MRI behaviour:

sebastien@greystones:~$ rvm ruby-1.9.3-p0
sebastien@greystones:~$ irb
1.9.3p0 :001 > a = "Hello World" 
 => "Hello World" 
1.9.3p0 :002 > b = "Hello World"
 => "Hello World" 
1.9.3p0 :003 > a.object_id
 => 20126420 
1.9.3p0 :004 > b.object_id
 => 19289920 
Sébastien Le Callonnec
  • 26,254
  • 8
  • 67
  • 80
  • Even calling `#to_sym` doesn’t help since at that point the string object has already been created. It has to be a symbol to start with. – Theo Jun 02 '12 at 21:17
0

This is now the default behaviour in JRuby. From version 9.1 all frozen string literals (e.g. 'hello'.freeze) return the same instance, and the same goes for literal strings used as hash keys (e.g. stuff['thing']) and a few other cases. See JRuby issue #3491.

If you want to aggressively freeze all string literals you can run both JRuby (9.1+) and Ruby (2.3+) with --enable-frozen-string-literal, but prepare for things to break since most gems assume that strings are mutable.

Theo
  • 131,503
  • 21
  • 160
  • 205