0

I'm taking a JSON string that's the result from polling the Foursquare venue API:

{
    "id"=>"4e404742c65b4ec27606deb4",
    "name"=>"Sarah's Cheesecake & Cafe",
    "contact"=>{
        "phone"=>"4134436678",
        "formattedPhone"=>"(413) 443-6678"
    },
    "location"=>{
        "address"=>"180 Elm St",
        "lat"=>42.44345873,
        "lng"=>-73.23804678,
        "distance"=>1063,
        "postalCode"=>"01201",
        "city"=>"Pittsfield",
        "state"=>"MA"
    },
    "categories"=>[
        {
            "id"=>"4bf58dd8d48988d16d941735",
            "name"=>"Café",
            "pluralName"=>"Cafés",
            "shortName"=>"Café",
            "icon"=>{
                "prefix"=>"https://foursquare.com/img/categories/food/cafe_",
                "sizes"=>[
                    32,
                    44,
                    64,
                    88,
                    256
                ],
                "name"=>".png"
            },
            "primary"=>true
        }
    ],
    "verified"=>false,
    "stats"=>{
        "checkinsCount"=>7,
        "usersCount"=>5,
        "tipCount"=>0
    },
    "hereNow"=>{
        "count"=>0
    }
}

As you can tell, there are some non-standard characters in there such as Cafés and that's breaking my Mongoid based Model in this JRuby on Rails app. When trying to to create an instance with MyModel.create, here's what I get.

jruby-1.6.5 :012 > FoursquareVenue.create(hash)
Java::JavaLang::NullPointerException: 
    from org.jruby.exceptions.RaiseException.<init>(RaiseException.java:101)
    from org.jruby.Ruby.newRaiseException(Ruby.java:3348)
    from org.jruby.Ruby.newEncodingCompatibilityError(Ruby.java:3323)
    from org.jruby.RubyString.cat(RubyString.java:1285)
    from org.jruby.RubyString.cat19(RubyString.java:1221)
    from org.jruby.RubyHash$5.visit(RubyHash.java:727)
    from org.jruby.RubyHash.visitAll(RubyHash.java:594)
    from org.jruby.RubyHash.inspectHash(RubyHash.java:721)
    from org.jruby.RubyHash.inspect(RubyHash.java:745)
    from org.jruby.RubyHash$i$0$0$inspect.call(RubyHash$i$0$0$inspect.gen:65535)
    from org.jruby.RubyClass.finvoke(RubyClass.java:632)
    from org.jruby.javasupport.util.RuntimeHelpers.invoke(RuntimeHelpers.java:545)
    from org.jruby.RubyBasicObject.callMethod(RubyBasicObject.java:353)
    from org.jruby.RubyObject.inspect(RubyObject.java:408)
    from org.jruby.RubyArray.inspectAry(RubyArray.java:1483)
    from org.jruby.RubyArray.inspect(RubyArray.java:1509)
... 420 levels...
    from org.jruby.evaluator.ASTInterpreter.INTERPRET_METHOD(ASTInterpreter.java:75)
    from org.jruby.internal.runtime.methods.InterpretedMethod.call(InterpretedMethod.java:190)
    from org.jruby.internal.runtime.methods.DefaultMethod.call(DefaultMethod.java:179)
    from org.jruby.runtime.callsite.CachingCallSite.cacheAndCall(CachingCallSite.java:312)
    from org.jruby.runtime.callsite.CachingCallSite.call(CachingCallSite.java:169)
    from usr.local.rvm.rubies.jruby_minus_1_dot_6_dot_5.bin.jirb.__file__(/usr/local/rvm/rubies/jruby-1.6.5/bin/jirb:17)
    from usr.local.rvm.rubies.jruby_minus_1_dot_6_dot_5.bin.jirb.load(/usr/local/rvm/rubies/jruby-1.6.5/bin/jirb)
    from org.jruby.Ruby.runScript(Ruby.java:693)
    from org.jruby.Ruby.runScript(Ruby.java:686)
    from org.jruby.Ruby.runNormally(Ruby.java:593)
    from org.jruby.Ruby.runFromMain(Ruby.java:442)
    from org.jruby.Main.doRunFromMain(Main.java:321)
    from org.jruby.Main.internalRun(Main.java:241)
    from org.jruby.Main.run(Main.java:207)
    from org.jruby.Main.run(Main.java:191)
    from org.jruby.Main.main(Main.java:171)

If I strip out all the odd characters, everything works as expected and no exception is thrown. What's the proper way of handling this? Can I enabled my Mongoid/MongoDB documents to work with UTF-8? do I need to "asciify" them somehow first if that's not possible?

randombits
  • 47,058
  • 76
  • 251
  • 433

4 Answers4

1

Could be an encoding bug in JRuby's 1.9 mode. Does the same thing happen when you run it in 1.8 mode? Either way, a stacktrace should be filed as a bug at http://bugs.jruby.org. Thanks!

Nick Sieger
  • 3,315
  • 20
  • 14
  • Hi Nick, I'm running everything in 1.8 mode still: jruby 1.6.5 (ruby-1.8.7-p330) (2011-10-25 9dcd388) (OpenJDK 64-Bit Server VM 1.6.0_20) [linux-amd64-java] – randombits Dec 13 '11 at 21:42
0

gem install bson_ext might help.

Source: MongoDB, Ruby and UTF-8

If you are using ubuntu, then you need to do some extra steps with spidermonkey/mongodb installation:

Most pre-built Javascript SpiderMonkey libraries do not have UTF-8 support compiled in; MongoDB requires this.

Source: Building for Linux

zengr
  • 38,346
  • 37
  • 130
  • 192
  • Nope, this is reproducible on both my prod environment (RHEL) and staging (OSX Lion) – randombits Dec 13 '11 at 21:57
  • In the both the installation scenarios, while installing mongodb, you need to be sure to mark -utf-8 enabled. – zengr Dec 13 '11 at 22:17
  • Actually I take that back. This is not reproducible on OSX anymore. Only on my RHEL box. Using the same exact gem set. – randombits Dec 13 '11 at 22:26
  • Then try to follow the instructions documented above, that might be the issue. – zengr Dec 13 '11 at 22:45
  • I am using a prebuilt binary supplied by 10gen. Don't really see a reason to compile my own. Is there a way to check if the precompiled binary has utf-8 support built into it? – randombits Dec 13 '11 at 23:20
  • That's the problem. The documentation clearly says that: `Most pre-built Javascript SpiderMonkey libraries do not have UTF-8 support compiled in; MongoDB requires this`. So the you need to build it manually by following the guidelines mentioned in the above link. – zengr Dec 13 '11 at 23:26
  • Unfortunately I've recompiled SpiderMonkey with UTF-8 enabled and compiled MongoDB. no luck though, same encoding issue. – randombits Dec 14 '11 at 00:55
  • Still think there is something going on outside of MongoDB. Although it's only reproducible on my RHEL box and not on my OSX devel box, I can insert UTF-8 strings into the mongo client. Just not via irb. – randombits Dec 14 '11 at 01:22
0

MongoDB and mongoid handle utf-8 properly. I was doing the same thing with the Foursquare API not long ago via the Quimby wrapper.

As a result, I would suspect the bug is closely related to the use of JRuby.

Tyler Brock
  • 29,626
  • 15
  • 79
  • 79
0

Have you set up JRuby to use UTF8?

require 'jcode'
$KCODE = 'u'
Don Werve
  • 5,100
  • 2
  • 26
  • 32