2

I'm trying to do some simple BSON parsing of Swift3 Data objects. I feel like I'm fighting the system.

Let's start with some input and a scheme:

let input = Data(bytes: [2, 0x20, 0x21, 3, 0x30, 0x31, 0x32, 1, 0x10, 4, 0x40, 0x41, 0x42, 0x43])

This is just a simple data stream, the frivolous scheme being that a leading byte indicates how many bytes follow making up the next chunk. So in the above, the leading 2 indicates that 0x20, 0x21 are the first chunk, followed by a 3 byte chunk containing the bytes 0x30, 0x31, 0x32, etc.

Streams

My first thought is to do it with a stream (er, Generator, Iterator, whatever). So I end up with something like:

var iter = input.makeIterator()

func parse(_ stream:inout IndexingIterator<Data>) -> Data {
    var result = Data()
    if let count = stream.next() {
        for _ in 0..<count {
            result.append(Data(bytes:[stream.next()!]))
        }
    }
    return result
}

parse(&iter)
parse(&iter)
parse(&iter)
parse(&iter)

This leads to multiple questions/observations:

1) Why would anyone ever let an iterator? The whole point of this thing is to keep track of an evolving position over a collection. I really struggle with why the Swift authors have chosen to send iterators down the "all hail the value semantics" rathole. It means I have to put inout's on all my parse functions.

2) I feel like I'm over-specifying the argument type with IndexingIterator. Maybe I just need to get used to verbose generics?

Python Struct'esque

Frustrated with that approach, I thought I might emulate pythons struct.unpack() style, where a tuple is returned of the parsed data, as well as the unconsumed data. Since supposedly Data are magical and efficient as long as I don't mutate them. That turned up like:

func parse2(_ data:Data) -> (Data, Data) {
    let count = Int(data[0])
    return (data.subdata(in: 1..<count+1), data.subdata(in: count+1..<data.count))
}

var remaining = input
var chunk = Data()
(chunk, rest) = parse2(remaining)
chunk
(chunk, rest) = parse2(remaining)
chunk
(chunk, rest) = parse2(remaining)
chunk
(chunk, rest) = parse2(remaining)
chunk

I ran into two issues with this.

1) What I really wanted to return was data[1..count], data.subdata(in: count+1..<data.count). But this returns a MutableRandomAccessSlice. Which seems to be a totally different kind of type? So I ended up using the more involved subdata.

2) One can subscript a Data with a closed range, but the subdata method will only take an open range. What's with that?

Open Rebellion, Old Habits Kick In

Now annoyed that this old Smalltalker can't seem to find happiness here, I just roll my own:

class DataStream {
    let data:Data
    var index = 0
    var atEnd:Bool {
        return index >= self.data.count
    }

    init(data:Data) {
        self.data = data
    }

    func next() -> UInt8 {
        let byte = self.data[self.index]
        self.index += 1
        return byte
    }

    func next(_ count:Int) -> Data {
        let subdata = self.data.subdata(in: self.index..<self.index + count)
        self.index += count
        return subdata
    }

}

func parse3(_ stream:DataStream) -> Data {
    let count = Int(stream.next())
    return stream.next(count)
}

let stream = DataStream(data: input)
parse3(stream)
parse3(stream)
parse3(stream)
parse3(stream)

This solution I'm happy with from an end use POV. I can flesh out DataStream to do all kinds of stuff. But... I'm now off the beaten path and feel like I'm not "getting it" (the Swiftish light bulb).

TL;DR version

After this playing around, I find myself curious what the most idiomatic way to stream through Data structs, extracting data from them, based on what is encountered in them.

HangarRash
  • 7,314
  • 5
  • 5
  • 32
Travis Griggs
  • 21,522
  • 19
  • 91
  • 167
  • for #2 you can use a type alias – Alexander Jun 27 '16 at 18:30
  • I can see your frustration but Xcode 8 is still in beta. Swift 2 wasn't super stable until Xcode 7 Beta 4 or 5 – Code Different Jun 29 '16 at 11:06
  • @CodeDifferent I'm not sure this has much to do with xcode8 or swift3, other than the assumed direction being to use `Data` preferentially over `[UInt8]`s going forward. A primary contributing factor of my frustration is why Iterators are better as value type structs instead of reference type objects. – Travis Griggs Jun 29 '16 at 21:18
  • 1
    Another option would be to create an [InputStream](https://developer.apple.com/reference/foundation/nsinputstream) from the data and then use the streams read method. – But your DataStream is also a good approach. I would change some details, like checking if sufficient data is available and return `nil` otherwise. – Martin R Jun 29 '16 at 21:26
  • Wish your comment was in answer form @MartinR; DataStream with your suggestions was the solution I ended up using and have been happy with. – Travis Griggs Aug 02 '16 at 17:35
  • @TravisGriggs: I suggest that you post a self-answer with your solution. – Martin R Aug 02 '16 at 19:08

2 Answers2

2

In the end, as mentioned in the comments, I went with a DataStream class including suggestions by MartinR. Here's the implementation I'm using today.

class DataStream {
    let data:Data
    var index = 0
    var atEnd:Bool {
        return index >= self.data.count
    }

    init(data:Data) {
        self.data = data
    }

    func next() -> UInt8? {
        guard self.atEnd.NOT else { return nil }
        let byte = self.data[self.index]
        self.index += 1
        return byte
    }

    func next(_ count:Int) -> Data? {
        guard self.index + count <= self.data.count else { return nil }
        let subdata = self.data.subdata(in: self.index..<self.index + count)
        self.index += count
        return subdata
    }

    func upTo(_ marker:UInt8) -> Data? {
        if let end = (self.index..<self.data.count).index( where: { self.data[$0] == marker } ) {
            let upTo = self.next(end - self.index)
            self.skip() // consume the marker
            return upTo
        }
        else {
            return nil
        }
    }

    func skip(_ count:Int = 1) {
        self.index += count
    }

    func skipThrough(_ marker:UInt8) {
        if let end = (self.index..<self.data.count).index( where: { self.data[$0] == marker } ) {
            self.index = end + 1
        }
        else {
            self.index = self.data.count
        }
    }
}
Travis Griggs
  • 21,522
  • 19
  • 91
  • 167
0

(Disclaimer: I realize upon reading the OP:s question again (not just the using TL;DR eyes ;) that this don't really answer the OP:s question, but I'll leave it as it might make a somewhat interesting addition to the discussion; split don't produce a stream here, but rather a chunk of data sequences)

This is somewhat similar to your "Python Struct'esque" solution; you could make use the isSeparator predicate closure of the split method of Data to parse your byte stream.

func split(maxSplits: Int = default, omittingEmptySubsequences: Bool = default, 
           isSeparator: @noescape UInt8 throws -> Bool) 
           rethrows -> [MutableRandomAccessSlice<Data>]

Returns the longest possible subsequences of the sequence, in order, that don’t contain elements satisfying the given predicate. Elements that are used to split the sequence are not returned as part of any subsequence.

From The Language Reference for the Data structure (Swift 3).

I haven't had time to download XCode 8 beta myself (and IBM Swift 3-dec Sandbox does not include the Data structure, for some reason), but an example of using split(...) (here: just for an array) could look something along the lines of:

/* Swift 2.2 example */
let input: [UInt8] = [2, 0x20, 0x21, 3, 0x30, 0x31, 0x32, 1, 0x10, 4, 0x40, 0x41, 0x42, 0x43]

var isSep = true
var byteCounter: (UInt8, UInt8) = (0,0)
let parsed = input.split() { elem -> Bool in
    if isSep {
        isSep = false
        byteCounter.0 = 0
        byteCounter.1 = elem
        return true
    }

    byteCounter.0 += 1
    if byteCounter.0 == byteCounter.1 {
        isSep = true // next is separator
    }
    return false

}.map { Array($0) }

print(parsed) // [[32, 33], [48, 49, 50], [16], [64, 65, 66, 67]]

A non-tested equivalent snippet for the Swift 3 Data structure:

let input = Data(bytes: [2, 0x20, 0x21, 3, 0x30, 0x31, 0x32, 1, 0x10, 4, 0x40, 0x41, 0x42, 0x43])

var isSep = true
var byteCounter: (UInt8, UInt8) = (0,0)
let parsed = input.split(maxSplits: 1000, omittingEmptySubsequences: false) { elem -> Bool in
    if isSep {
        isSep = false
        byteCounter.0 = 0
        byteCounter.1 = elem
        return true
    }
    byteCounter.0 += 1
    if byteCounter.0 == byteCounter.1 {
        isSep = true // next is separator
    }
    return false
}.map { $0 }

Note that MutableRandomAccessSlice<Data> should be "trivially" transformable to Data (Data itself is a MutableCollection of bytes), compare e.g. to the simple map over self to transform ArraySlice<Int> to Array<Int>

let foo = [1, 2, 3]
let bar = foo[0..<2]     // ArraySlice<Int>
let baz = bar.map { $0 } // Array<Int>
dfrib
  • 70,367
  • 12
  • 127
  • 192