1

So I'm working on some MongoDB protocol stuff. All integers are signed little-endian. Using Ruby's standard Array#pack method, I can convert from an integer to the binary string I want just fine:

positive_one = Array(1).pack('V')   #=> '\x01\x00\x00\x00'
negative_one = Array(-1).pack('V')  #=> '\xFF\xFF\xFF\xFF'

However, going the other way, the String#unpack method has the 'V' format documented as specifically returning unsigned integers:

positive_one.unpack('V').first #=> 1
negative_one.unpack('V').first #=> 4294967295

There's no formatter for signed little-endian byte order. I'm sure I could play games with bit-shifting, or write my own byte-mangling method that doesn't use array packing, but I'm wondering if anyone else has run into this and found a simple solution. Thanks very much.

SFEley
  • 7,660
  • 5
  • 28
  • 31

4 Answers4

2

Edit I misunderstood the direction you were converting originally (according to the comment). But after thinking about it some, I believe the solution is still the same. Here is the updated method. It does the exact same thing, but the comments should explain the result:

def convertLEToNative( num )
    # Convert a given 4 byte integer from little-endian to the running
    # machine's native endianess.  The pack('V') operation takes the
    # given number and converts it to little-endian (which means that
    # if the machine is little endian, no conversion occurs).  On a
    # big-endian machine, the pack('V') will swap the bytes because
    # that's what it has to do to convert from big to little endian.  
    # Since the number is already little endian, the swap has the
    # opposite effect (converting from little-endian to big-endian), 
    # which is what we want. In both cases, the unpack('l') just 
    # produces a signed integer from those bytes, in the machine's 
    # native endianess.
    Array(num).pack('V').unpack('l')
end

Probably not the cleanest, but this will convert the byte array.

def convertLEBytesToNative( bytes )
    if ( [1].pack('V').unpack('l').first == 1 )
        # machine is already little endian
        bytes.unpack('l')
    else
        # machine is big endian
        convertLEToNative( Array(bytes.unpack('l')))
    end
end
Mark Wilkins
  • 40,729
  • 5
  • 57
  • 110
  • Not quite. I'm going to be receiving binary strings _containing_ signed integers in little-endian representation. I need to turn those binary strings into Ruby integers, and I need to do it consistently. – SFEley Mar 09 '11 at 18:42
  • I think there's still some confusion. I have a string of bytes coming from MongoDB, representing signed little-endian numbers. I want a method that will take a string and return the correct number. E.g., it should return 1000 if passed `"\xE8\x03\x00\x00"` and -1000 if passed `"\x18\xFC\xFF\xFF"`. Your method does something different. I can go the other way (numbers to binary strings) just fine. – SFEley Mar 09 '11 at 19:40
  • @SFEley: It wasn't clear to me from the OP that the input was to be the string. I added a method for that conversion. – Mark Wilkins Mar 09 '11 at 19:53
2

After unpacking with "V", you can apply the following conversion

class Integer
  def to_signed_32bit
    if self & 0x8000_0000 == 0x8000_0000
      self - 0x1_0000_0000  
    else
      self
    end
  end
end

You'll need to change the magic constants 0x1_0000_0000 (which is 2**32) and 0x8000_0000 (2**31) if you're dealing with other sizes of integers.

Ken Bloom
  • 57,498
  • 14
  • 111
  • 168
  • That _almost_ works. You'd have to make it `if self >= 0x8000_0000` -- the `&` operator returns an integer, not a boolean. Otherwise, though, thanks! That does seem to be the simplest solution, and rewriting my method with it passed all the unit tests. – SFEley Mar 09 '11 at 20:45
  • @SFEley: I've edited it for my preferred way to rewrite that if condition, so that it's clear that I'm specifically testing for the sign bit. Also, I hope you fixed the bug where I forgot to include the `else` condition. – Ken Bloom Mar 09 '11 at 20:55
1

This question has a method for converting signed to unsigned that might be helpful. It also has a pointer to the bindata gem which looks like it will do what you want.

BinData::Int16le.read("\000\f") # 3072

[edited to remove the not-quite-right s unpack directive]

Community
  • 1
  • 1
Paul Rubel
  • 26,632
  • 7
  • 60
  • 80
  • Formatting weirdness fixed -- thanks for the catch. It's what I get for typing quickly instead of copying and pasting. >8-/ – SFEley Mar 08 '11 at 18:26
  • I don't think the 's' directive is the answer, even with the length modifiers. It treats the string in _native byte order_, which would be little endian on some processors and big endian on others. I need little endian all the time. – SFEley Mar 08 '11 at 18:29
  • Thanks! That bitmasking method from the other question does look cool -- and certainly more efficient than the other method I finally got to work. I knew about BinData, and I've considered putting it to use, but it seemed like overkill when I'm using the Mongo team's BSON gem for almost everything else. (I even considered rewriting the entire BSON specification in BinData, but the BSON gem with its C extensions is much faster than anything I think I could do in pure Ruby.) – SFEley Mar 09 '11 at 19:54
1

For the sake of posterity, here's the method I eventually came up with before spotting Paul Rubel's link to the "classical method". It's kludgy and based on string manipulation, so I'll probably scrap it, but it does work, so someone might find it interesting for some other reason someday:

# Returns an integer from the given little-endian binary string.
# @param [String] str
# @return [Fixnum]
def self.bson_to_int(str)
  bits = str.reverse.unpack('B*').first   # Get the 0s and 1s
  if bits[0] == '0'   # We're a positive number; life is easy
    bits.to_i(2)
  else                # Get the twos complement
    comp, flip = "", false
    bits.reverse.each_char do |bit|
      comp << (flip ? bit.tr('10','01') : bit)
      flip = true if !flip && bit == '1'
    end
    ("-" + comp.reverse).to_i(2)
  end
end

UPDATE: Here's the simpler refactoring, using a generalized arbitrary-length form of Ken Bloom's answer:

# Returns an integer from the given arbitrary length little-endian binary string.
# @param [String] str
# @return [Fixnum]
def self.bson_to_int(str)
  arr, bits, num = str.unpack('V*'), 0, 0
  arr.each do |int|
    num += int << bits
    bits += 32
  end
  num >= 2**(bits-1) ? num - 2**bits : num  # Convert from unsigned to signed
end
Community
  • 1
  • 1
SFEley
  • 7,660
  • 5
  • 28
  • 31