I'm trying to do some simple BSON parsing of Swift3 Data objects. I feel like I'm fighting the system.
Let's start with some input and a scheme:
let input = Data(bytes: [2, 0x20, 0x21, 3, 0x30, 0x31, 0x32, 1, 0x10, 4, 0x40, 0x41, 0x42, 0x43])
This is just a simple data stream, the frivolous scheme being that a leading byte indicates how many bytes follow making up the next chunk. So in the above, the leading 2 indicates that 0x20, 0x21 are the first chunk, followed by a 3 byte chunk containing the bytes 0x30, 0x31, 0x32, etc.
Streams
My first thought is to do it with a stream (er, Generator, Iterator, whatever). So I end up with something like:
var iter = input.makeIterator()
func parse(_ stream:inout IndexingIterator<Data>) -> Data {
var result = Data()
if let count = stream.next() {
for _ in 0..<count {
result.append(Data(bytes:[stream.next()!]))
}
}
return result
}
parse(&iter)
parse(&iter)
parse(&iter)
parse(&iter)
This leads to multiple questions/observations:
1) Why would anyone ever let
an iterator? The whole point of this thing is to keep track of an evolving position over a collection. I really struggle with why the Swift authors have chosen to send iterators down the "all hail the value semantics" rathole. It means I have to put inout
's on all my parse functions.
2) I feel like I'm over-specifying the argument type with IndexingIterator. Maybe I just need to get used to verbose generics?
Python Struct'esque
Frustrated with that approach, I thought I might emulate pythons struct.unpack() style, where a tuple is returned of the parsed data, as well as the unconsumed data. Since supposedly Data are magical and efficient as long as I don't mutate them. That turned up like:
func parse2(_ data:Data) -> (Data, Data) {
let count = Int(data[0])
return (data.subdata(in: 1..<count+1), data.subdata(in: count+1..<data.count))
}
var remaining = input
var chunk = Data()
(chunk, rest) = parse2(remaining)
chunk
(chunk, rest) = parse2(remaining)
chunk
(chunk, rest) = parse2(remaining)
chunk
(chunk, rest) = parse2(remaining)
chunk
I ran into two issues with this.
1) What I really wanted to return was data[1..count], data.subdata(in: count+1..<data.count)
. But this returns a MutableRandomAccessSlice. Which seems to be a totally different kind of type? So I ended up using the more involved subdata
.
2) One can subscript a Data with a closed range, but the subdata
method will only take an open range. What's with that?
Open Rebellion, Old Habits Kick In
Now annoyed that this old Smalltalker can't seem to find happiness here, I just roll my own:
class DataStream {
let data:Data
var index = 0
var atEnd:Bool {
return index >= self.data.count
}
init(data:Data) {
self.data = data
}
func next() -> UInt8 {
let byte = self.data[self.index]
self.index += 1
return byte
}
func next(_ count:Int) -> Data {
let subdata = self.data.subdata(in: self.index..<self.index + count)
self.index += count
return subdata
}
}
func parse3(_ stream:DataStream) -> Data {
let count = Int(stream.next())
return stream.next(count)
}
let stream = DataStream(data: input)
parse3(stream)
parse3(stream)
parse3(stream)
parse3(stream)
This solution I'm happy with from an end use POV. I can flesh out DataStream to do all kinds of stuff. But... I'm now off the beaten path and feel like I'm not "getting it" (the Swiftish light bulb).
TL;DR version
After this playing around, I find myself curious what the most idiomatic way to stream through Data structs, extracting data from them, based on what is encountered in them.