3

I am extremely new at Scala and I'm getting confused by the bit manipulation features. I am hoping someone can point me in the right direction?

I have a byte array defined with the following bit fields:

0-3 - magic number
  4 - version
5-7 - payload length in bytes
8-X - payload, of variable length, as indicated in bits 5-7

I would like to serialize this back and forth to a structure such as:

MagicNumber: Integer
Version: Integer
Length: Integer
payload: Array[Byte]

How do you deal with bits in this situation optimally? Most of the examples I've seen deal with higher level serialization, such as JSON. I am trying to serialize and deserialize TCP binary data in this case.

Will I Am
  • 2,614
  • 3
  • 35
  • 61

2 Answers2

7

You can use Scala Pickling or POF or Google Protobuf, but if your format is so restricted, the simplest way is to write your own serializer:

case class Data(magicNumber: Int, version: Int, payload: Array[Byte])

def serialize(data: Stream[Data]): Stream[Byte] = 
   data.flatMap(x => 
     Array((x.magicNumber << 4 | x.version << 3 | x.payload.length).toByte) ++ x.payload)

@scala.annotation.tailrec
def deserialize(binary: Stream[Byte], acc: Stream[Data] = Stream[Data]()): Stream[Data] =   
   if(binary.nonEmpty) {
     val magicNumber = binary.head >> 4 
     val version = (binary.head & 0x08) >>3 
     val size = binary.head & 0x07
     val data = Data(magicNumber, version, ByteVector(binary.tail.take(size).toArray)) 
     deserialize(binary.drop(size + 1), acc ++ Stream(data)) 
   } else acc

Or you can use Scodec library (this option is better because you will have automatical value range check):

Sbt:

  libraryDependencies += "org.typelevel" %% "scodec-core" % "1.3.0"

Codec:

  case class Data(magicNumber: Int, version: Int, payload: ByteVector)
  val codec = (uint(4) :: uint(1) :: variableSizeBytes(uint(3), bytes)).as[Data]

Use:

  val encoded = codec.encode(Data(2, 1, bin"01010101".bytes)).fold(sys.error, _.toByteArray)
  val decoded = codec.decode(BitVector(encoded)).fold(sys.error, _._2)
dk14
  • 22,206
  • 4
  • 51
  • 88
  • Thank you, this helps. I am also trying to look at scala pickling, but it's lacking examples unfortunately. i don't yet understand how I would do it with pickling/unpickling. For now, this is sufficient for me! – Will I Am Sep 19 '14 at 15:25
  • scodec is more bit-oriented than pickling, i have updated the answer with working example for it. But it depends on heavy libraries like scalaz, shapeless. – dk14 Sep 19 '14 at 21:08
  • Am I doing this right? I don't seem to be able to do a round trip: val input2 = new Data(0,3,Array[Byte]('h','e','l','l','o')) val output2 = serialize(Stream(input2)) val output3 = deserialize(output2) – Will I Am Sep 20 '14 at 05:30
  • That's why i recommend to use scodec (it has automatical range check). In your case version = 3. Maximum possible value for version (1 bit) is 1. You can also add assert (for magic < 2^4, version < 2, data.size < 2^3) to serialize function to avoid such non-roundtrip situation. – dk14 Sep 20 '14 at 11:25
  • Thanks. I ended up using bitfield manipulation simply because I didn't want to pull in additional big dependencies. It wasn't that hard but it could've been much easier if there was support for unsigned byte. – Will I Am Sep 22 '14 at 14:38
4

I'd look at scodec. Based on the UDP example, it should be something like (untested):

import scodec.bits.{ BitVector, ByteVector }
import scodec.codecs._

case class Datagram(
  magicNumber: Int,
  version: Byte,
  payload: ByteVector)

object Datagram {
  implicit val codec: Codec[Datagram] = {
    ("magic_number" | int32 ) ::
    ("version" | byte ) ::
    variableSizeBytes(int(3),
      ("payload" | bytes ))
  }.as[Datagram]
}
om-nom-nom
  • 62,329
  • 13
  • 183
  • 228
Alexey Romanov
  • 167,066
  • 35
  • 309
  • 487