Byte align BitVector after decoding misaligned value

Question

I've come across an interesting problem with scodec. I have a peculiar encoding scheme that requires a stream byte alignment when the current bit pointer mod 8 is not zero (not aligned to the nearest byte).

Now normally, this would be handled by the byteAligned codec, but the situation appears to need global context of the entire decoding operation.

Here's a minimal example of the problem

case class Baz(abc : Int, qux : Int)  
case class Foo(bar : Int,
               items : Vector[Baz])

object Foo {
   implicit val codec : Codec[Foo] = (
      ("bar" | uint8L) ::
      ("items" | vectorOfN(uint8L, (
            ("abc" | uint8L) ::
            ("qux" | uint2L)
         ).as[Baz]
      ))).as[Foo]
}

items will be encoded and decoded as a Vector of Baz items. Notice that the qux member in Baz is represented as a 2-bit unsigned int. We want the abc member to be byte aligned, meaning that during it's decoding, it will decode and if the byte stream is misaligned, padding must be added after the decoded bits (Case C).

This means for a vector of size 1 (Case A), the output BitVector will be misaligned by 6-bits, which is fine because there are no more items (abc is never reached again in the vector decoding loop).

A vector of size 2 without byte aligning is shown in Case B. A vector with byte aligning is shown in Case C.

Below each case is comment breaking down the bitstream. 0x0a is hex for 10 decimal and 0b01 is binary for 1 decimal. I will switch between the two when more detail is needed and when the stream isn't byte aligned.

Case A

Codec.encode(Foo(10, Vector(Baz(4,1))))
res43: Attempt[BitVector] = Successful(BitVector(26 bits, 0x0a01044))
// bar  N items   abc(0)  qux(0)
// 0x0a   0x01     0x04    0b01

Case B

Codec.encode(Foo(10, Vector(Baz(4,1), Baz(15,3))))
res42: Attempt[BitVector] = Successful(BitVector(36 bits, 0x0a020443f))
// bar  N items   abc(0)  qux(0)    abc(1)    qux(1)
// 0x0a   0x02     0x04    0b01   0b00001111   0b11

Case C

Codec.encode(Foo(10, Vector(Baz(4,1), Baz(15,3))))
res42: Attempt[BitVector] = Successful(BitVector(42 bits, 0x0a020443c0c))
// bar  N items   abc(0)  qux(0)    abc(1)    padding   qux(1)
// 0x0a   0x02     0x04    0b01   0b00001111  0b000000   0b11

Another way of looking at Case C is unrolling the codecs

// bar     N items   abc(0)     qux(0)   abc(1)    padding       qux(1)
(uint8L :: uint8L :: uint8L :: uint2L :: uint8L :: ignore(6) :: uint2L).
   dropUnits.encode(10 :: 2 :: 4 :: 1 :: 15 :: 3 :: HNil)
res47: Attempt[BitVector] = Successful(BitVector(42 bits, 0x0a020443c0c))

Notice that there wasn't any padding after abc(0) as the stream was byte aligned at that point. At abc(1) it was not aligned due to the decoding of qux(0). After the alignment, qux will be decoding on an aligned boundary. We always want qux to decode on a byte aligned position.

What I need is some method to either get a global context to figure out if the current bit pointer in the overall BitVector is aligned. If not, I need a codec to intelligently know when it's not decoding on an aligned boundary and to correct the stream after decoding it's current value.

I know codecs can get the current bitstream to decode from, but they have no idea about where they are in an overall bitstream.

Any direction or code would be most appreciated.

P.S.

If you want to test in Ammonite you'll need this to start

load.ivy("org.scodec" %% "scodec-core" % "1.8.3")
import scodec.codecs._
import scodec._
import shapeless._

When copying and pasting, you'll need to start with a { to make sure the case class and case object get defined at the same time.

Byte align BitVector after decoding misaligned value

0 Answers0