0

I'm porting a Ruby gem written in C to Ruby with FFI.

When I run the tests using MRI Ruby there aren't any seg-faults. When running in jRuby, I get a seg-fault.

This is the code in the test that I think is responsible:

if type == Date or type == DateTime then
  assert_nil param.set_value(value.strftime("%F %T"));
else
  assert_nil param.set_value(value);
end
@api.sqlany_bind_param(stmt, 0, param)
puts "\n#{param.inspect}"

#return if String === value or Date === value or DateTime === value
assert_succeeded @api.sqlany_execute(stmt)

The segmentation fault happens when running sqlany_execute, but only when the object passed to set_value is of the class String.

sqlany_execute just uses FFI's attach_function method.

param.set_value is more complicated. I'll focus just on the String specific part. Here is the original C code

case T_STRING:
    s_bind->value.length = malloc(sizeof(size_t));
    length = RSTRING_LEN(val);
    *s_bind->value.length = length;
    s_bind->value.buffer = malloc(length);
    memcpy(s_bind->value.buffer, RSTRING_PTR(val), length);
    s_bind->value.type = A_STRING;
    break;

https://github.com/in4systems/sqlanywhere/blob/db25e7c7a2d5c855ab3899eacbc7a86b91114f53/ext/sqlanywhere.c#L1461

In my port, this became:

when String
  self[:value][:length] = SQLAnywhere::LibC.malloc(FFI::Type::ULONG.size)
  length = value.bytesize
  self[:value][:length].write_int(length)
  self[:value][:buffer] = SQLAnywhere::LibC.malloc(length + 1)
  self[:value][:buffer_size] = length + 1

  ## Don't use put_string as that includes the terminating null
  # value.each_byte.each_with_index do |byte, index|
  # self[:value][:buffer].put_uchar(index, byte)
  # end
  self[:value][:buffer].put_string(0, value)
  self[:value][:type] = :string

https://github.com/in4systems/sqlanywhere/blob/e49099a4e6514169395523391f57d2333fbf7d78/lib/bind_param.rb#L31

My question is: what's causing jRuby to seg fault and what can I do about it?

Chris
  • 43
  • 4
  • What version of JRuby? Is this happening with JRuby master? Chances are this might be a JRuby FFI bug, possibily already fixed. – Sébastien Le Callonnec Jan 16 '13 at 16:15
  • $ ruby -v jruby 1.7.1 (1.9.3p327) 2012-12-03 30a153b on OpenJDK 64-Bit Server VM 1.7.0_09-icedtea-mockbuild_2012_12_06_11_04-b00 [linux-amd64] I'll try upgrading to jRuby 1.7.2 – Chris Jan 16 '13 at 16:22
  • 1
    @SébastienLeCallonnec I've updated to jRuby-1.7.2 and my ffi to ffi-1.3.1-java.gem. The test still seg faults when using jRuby-1.7.2 – Chris Jan 16 '13 at 16:34
  • Be careful of the line where you are setting the length: self[:value][:length].write_int(length) should probably be self[:value][:length].write_long(length) –  Jan 16 '13 at 19:34
  • Is size_t the same size as unsigned long on your machine? – Frederick Cheung Jan 16 '13 at 21:44
  • @wmeissner I changed it to write_ulong and now it runs without seg faulting. If you put it as the answer, I'll accept it. That said, I don't understand why MRI Ruby would work but it would crash jRuby. – Chris Jan 17 '13 at 09:29
  • @FrederickCheung size_t is the same as unsigned long on my machine. It would be nice if RubyFFI had an abstraction for size_t. – Chris Jan 17 '13 at 09:32

1 Answers1

1

This answer is possibly overly detailed, but I thought it would be good to go into a bit of depth for those who run across similar problems in the future.

It looks like this was your problem:

self[:value][:length].write_int(length)

when it should have been:

self[:value][:length].write_ulong(length)

On a 64 bit system, bytes 4..7 of the memory self[:value][:length] points to could have contained garbage (since malloc does not clear the memory it returns), and when the native code reads a size_t quantity at that address, it will be garbage, potentially indicating a buffer larger than 4 gigabytes.

e.g. if the string length is really 15 bytes, the lower 4 bits will be set, and the upper 60 should be all zero.

bit   0   1   2   3   4      32       63
    +---+---+---+---+---+ ~ +---+ ~ +---+
    | 1 | 1 | 1 | 1 | 0 | ~ | 0 | ~ | 0 |
    +---+---+---+---+---+ ~ +---+ ~ +---+

if just one bit in that upper 32 bits is set, then you get a > 4 gigabyte value

bit   0   1   2   3   4      32       63
    +---+---+---+---+---+ ~ +---+ ~ +---+
    | 1 | 1 | 1 | 1 | 0 | ~ | 1 | ~ | 0 |
    +---+---+---+---+---+ ~ +---+ ~ +---+

which would be a length of 4294967311 bytes.

One way to fix it, is to define a SizeT struct and use that for the length. e.g.

class SizeT < FFI::Struct
  layout :value, :size_t
end

self[:value][:length] = SQLAnywhere::LibC.malloc(SizeT.size)
length = value.bytesize
SizeT.new(self[:value][:length])[:value] = length

or you could monkey patch FFI::Pointer:

class FFI::Pointer
  if FFI.type_size(:size_t) == 4
    def write_size_t(val)
      write_int(val)
    end
  else
    def write_size_t(val)
      write_long_long(val)
    end
  end
end

Why was it only segfaulting on JRuby, not on MRI? Maybe MRI was a 32 bit executable (printing the value of FFI.type_size(:size_t) will tell you).

  • Both MRI ruby and jRuby are 64 bit. Thanks for your answer and help. `$ ruby -v ruby 1.9.3p286 (2012-10-12 revision 37165) [x86_64-linux] $ ruby -e 'require "ffi"; puts FFI.type_size(:size_t)' 8 $ rvm use jruby Using ~/.rvm/gems/jruby-1.7.1 $ ruby -v jruby 1.7.1 (1.9.3p327) 2012-12-03 30a153b on OpenJDK 64-Bit Server VM 1.7.0_09-icedtea-mockbuild_2013_01_14_23_04-b00 [linux-amd64] $ ruby -e 'require "ffi"; puts FFI.type_size(:size_t)' 8 ` – Chris Jan 18 '13 at 15:29
  • It could also be you were just lucky on MRI that it didn't segfault. Memory returned from malloc() may have been clean on MRI because of a much lower level of memory churn during startup. –  Jan 19 '13 at 20:08