1

Context:

I have a project where I store a lot of data in Binary files and Data files. I retrieve offsets in a Binary file, stored as UInt64, and each of these offsets give me the position of an utf-8 encoded String in another file.

I am attempting, given all the offsets, to reconstruct all the strings from the utf-8 file. The file that hold all the strings has a size of exactly 20437 bytes / approx 177000 strings.

Assuming I already retrieved all the offsets and now need to rebuild each string one at a time. I also have the length in bytes of every String.

Method 1:

I open a FileHandle set to the utf8 encoded file, and for each offset I seek to the offset and perform a readData(ofLength:), the whole operation is very long... More than 35 seconds.

Method 2:

I initialize a Data object with Data(contentsOf: URL). Then, I perform a Data.subdata(in: Range) for each string I want to build. The range starts from offset and ends to offset + size. This will load the entire file into the RAM, and allow me to retrieve the bytes I need for each String. This is much faster than the first option, but probably as bad performance-wise.

How can I get the best performance for this particular task ?

Scaraux
  • 3,841
  • 4
  • 40
  • 80

1 Answers1

1

I recently went through a similar experience when caching/loading binary data to/from disk.

Im not sure what the ultimate process is for best performance, but you can improve performance of method 2 further still, by using a "slice" of the data object instead of data.subdata(). This is similar to using array slices.

This probably because instead of creating more data objects with COPIES of the original data, the data returned from the slice uses the source Data object as a reference. This made a significant difference for me as my source data was actually pretty large. You should profile both methods and see if it makes a noticeable for you.

https://developer.apple.com/documentation/foundation/data/1779919-subscript

Rufus Mall
  • 569
  • 4
  • 14