0

I'm trying to use MessagePack to serialize integers in Erlang and Java.

In Java I'm able to pad an array holding one integer with as many 0s as I wish and MessagePack.read() still returns the correct value. But in Erlang msgpack:unpack/1 fails if there are any additional zeros.

For example, msgpack:unpack/1 passed <<10>> works as expected returns {ok,10}. But adding additional zeros and passing <<10,0,0>> fails, returning {error,not_just_binary}. The comments in the API state that the error means that a term was decoded but binary remains.

Selali Adobor
  • 2,060
  • 18
  • 30

2 Answers2

1

The library msgpack is not meant to decode raw binaries, but binaries which was previously encoded with msgpack:pack.

The reason is that a binary has no structure by itself, so you must include some information in it to allow the decoding. It is what a function like term_to_binary does, using the erlang external format:

1> B = term_to_binary({12,atom,[$a,$l,$i,$s,$t]}).
<<131,104,3,97,12,100,0,4,97,116,111,109,107,0,5,97,108,
  105,115,116>>
2> binary_to_term(B).
{12,atom,"alist"}

The library msgpack allows to use other encoding method.

Coming to your issue. The difference between unpack and unpack_stream is that the first expect one single encoded term in the binary while the second suppose that the trailing binary contains other encoded terms.

when you call msgpack:unpack(<<10>>), it falls in the case where the first element is smaller than 128: in this case the coded value is the value itself. If you had tried with something greater than 127, you had got an error:

4> msgpack:unpack(<<10>>).
{ok,10}
5> msgpack:unpack(<<200>>).
{error,incomplete}
6>

when you call msgpack:unpack_stream(<<10>>), it does exactly the same, so the first element is decoded, with the result 10, and the rest of the binary is provided fro further decoding:

8> {A,Rest} = msgpack:unpack_stream(<<10,0>>).
{10,<<0>>}
9> msgpack:unpack_stream(Rest).               
{0,<<>>}
10> msgpack:unpack_stream(<<200,0>>).            
{error,incomplete}
11> msgpack:unpack_stream(<<200,0,0>>).
{error,incomplete}
12> msgpack:unpack_stream(<<200,0,0,0>>).
{error,{badarg,{bad_ext,200}}}
13> 

The right way to use the library is to encode first your message:

13> Msg = msgpack:pack(<<10,0,0>>).
<<163,10,0,0>>
14> msgpack:unpack(Msg).                 
{ok,<<10,0,0>>}

or with the first example:

24> Msg1 = msgpack:pack(msgpack:term_to_binary({12,atom,[$a,$l,$i,$s,$t]})).     
<<183,199,20,131,131,104,3,97,12,100,0,4,97,116,111,109,
  107,0,5,97,108,105,115,116>>
25> {ok,Rep1} = msgpack:unpack(Msg1).                                       
{ok,<<199,20,131,131,104,3,97,12,100,0,4,97,116,111,109,
      107,0,5,97,108,105,115,116>>}
26> msgpack:binary_to_term(Rep1).
{12,atom,"alist"}
27> 

[edit]

here is a proposal to add padding and an unpacker that detect it. It uses the unpack_stream, because it is not possible to modify the way an integer is coded.

Packer = fun(X, Opt) -> {ok, {12,<<>>}} end,
Unpacker = fun(12, _) -> {ok, padding} end,
Opt = [{ext,{Packer,Unpacker}}],
Pad = fun(B) -> Size = 10 - size(B), SB = Size*8,<<B/binary,16#C7,Size,12,0:SB>> end,
R = msgpack:pack(256897),
Var = Pad(R),
{I,Rest} = msgpack:unpack_stream(Var,Opt),
{padding,<<>>} = msgpack:unpack_stream(Rest,Opt).
Pascal
  • 13,977
  • 2
  • 24
  • 32
  • Most of your answer is based on the belief that I was trying to decode random binary ... which I wasn't. I'm fully aware that 0-127 are the only values that encode to 1 byte in MessagePack's encoding. I chose 10 as a convenient value for the question because it has no meaningful text encoding for erlang to print as a bitstring. – Selali Adobor Feb 08 '15 at 14:53
  • My question is is there a way to denote a fixed width (or padded) integer. Using `unpack_stream` works as I mentioned below, but it's not the same as a fixed width term, it's ignoring the padding that I manually add. I'd rather not add padding which could interfere with certain values. – Selali Adobor Feb 08 '15 at 14:55
  • Well if you add some trailing value to your encoded message, you simply mess it up. As I said, the steam version is here to deal with stream of term, decoding the first one and giving the rest for further decoding; in his context the added 0 can be interpreted as extra terms equal to 0. Binaries work at the byte level, it is not necessary to add padding bytes. Remark, integers do not have a fixed length in erlang, small integers are coded on 28 bits (32 bits arch) or 60 bits (64 bits arch) and use silently the big num notation (3..N words) when they become bigger. – Pascal Feb 08 '15 at 15:14
  • If you want, the current version of mesgpak allow the usage of your own coding/decoding method. If you really need padding, you should look in that direction. – Pascal Feb 08 '15 at 15:17
  • Adding some trailing value doesn't mess it up automatically, the binary created with the padding has to change the term to a different _valid_ term (which isn't possible with my current framing, but it feels like a hack, not a solution). And you can't say "it is not necessary to add padding bytes", it's necessary for my use-case, a fixed length header. And I am fully aware *Erlang* isn't using fixed width integers, but that has nothing to do with my question. As I said in the question, I I'm looking for fixed width *Message Pack* integers. – Selali Adobor Feb 09 '15 at 00:03
  • I've looked through the MessagePack spec and saw a mention of extensions for fixed width integers, but I don't see a method to take advantage of them from the Erlang API – Selali Adobor Feb 09 '15 at 00:04
  • In the last edit I propose a solution using the ext option. The padding cannot be simply 0s, I have added a specific term that returns the atom: padding. – Pascal Feb 09 '15 at 08:51
  • I assure you if the binary length is exactly 5, for all values 0 to 4294967295 ( 0xFFFFFFFF [encoded with the prefix 0xCE in MessagePack for 5 bytes in total], if the padding is all 0s, a valid combination of bits for a different term won't ever be created (meaning `unpack_stream/1` will always return a tuple of the integer and the padding [or a second term of garbage representing the padding]). So using the ext option will not be any more effective than padding with 0s. And as I've said before, the encoding is between Java and Erlang, so using 0s is preferable to an encoded Erlang term. – Selali Adobor Feb 09 '15 at 18:10
  • I know it works with 0s; my concern is that this implementation prevents to really use the packet streaming if you ever need it. It introduces an unknown *(since small integer, integer, negative integer are coded differently)* number of valid *(but garbage)* terms. My choice would be: first, try to avoid padding; second, use ext (but with a cleaner implementation, mine directly picked the coding rules into the current version of the source code). But of course these are general concerns that may be totally meaningless for your use case. – Pascal Feb 09 '15 at 19:06
  • Avoiding padding is impossible because this needs to be a fixed length header sent over the wire, and I will require packet streaming, but it doesn't stop me from using it, it simply means splitting/joining binary before I feed it to `unpack_stream` (which I have to do anyways because I'm using TCP). Also as I said before, it doesn't introduce unknown valid terms that interfere with the value I'm padding for values 0 to 4294967295 (which represents 3 types of positive integers, 1 byte wide, 3 bytes wide, and 5 bytes wide according to the specs) when the binary is exactly 5 bytes long. – Selali Adobor Feb 09 '15 at 19:25
0

While I was looking through the Erlang API's source to get more information to ask this question I noticed another function, msgpack:unpack_stream/1 which returns a tuple with the first decoded term, paired with the extraneous binary, instead of returning an error. This behaves more along the lines of read in Java.

But I'd still like to know if there's a better way to go about this, such as a way to use a fixed length type.

Selali Adobor
  • 2,060
  • 18
  • 30