0

i have to read a file char by char in swift. The way I am doing it is to read a chunk from a FileHandler and returning the first character of a string.

This is my code so far:

/// Return next character, or nil on EOF.
func nextChar() -> Character? {
    precondition(fileHandle != nil, "Attempt to read from closed file")

    if atEof {
        return nil
    }

    if self.stored.characters.count > 0 {
        let c: Character = self.stored.characters.first!
        stored.remove(at: self.stored.startIndex)
        return c
    }

    let tmpData = fileHandle.readData(ofLength: (4096))
    print("\n---- file read ---\n" , terminator: "")
    if tmpData.count == 0 {
        return nil
    }

    self.stored = NSString(data: tmpData, encoding: encoding.rawValue) as String!
    let c: Character = self.stored.characters.first!
    self.stored.remove(at: stored.startIndex)
    return c
}

My problem with this is that the returning of a character is very slow. This is my test implementation:

if let aStreamReader = StreamReader(path: file) {
    defer {
        aStreamReader.close()
    }
    while let char = aStreamReader.nextChar() {
        print("\(char)", terminator: "")
        continue
    }
}

even without a print it took ages to read the file to the end.

for a sample file with 1.4mb it took more than six minutes to finish the task.

time ./.build/debug/read a.txt
real    6m22.218s
user    6m13.181s
sys     0m2.998s

Do you have an opinion how to speed up this part?

let c: Character = self.stored.characters.first!
stored.remove(at: self.stored.startIndex)
return c

Thanks a lot. ps

++++ UPDATEED FUNCTION ++++

func nextChar() -> Character? {
    //precondition(fileHandle != nil, "Attempt to read from closed file")

    if atEof {
        return nil
    }

    if stored_cnt > (stored_idx + 1) {
        stored_idx += 1
        return stored[stored_idx]
    }

    let tmpData = fileHandle.readData(ofLength: (chunkSize))
    if tmpData.count == 0 {
        atEof = true
        return nil
    }

    if let s = NSString(data: tmpData, encoding: encoding.rawValue) as String! {
        stored = s.characters.map { $0 }
        stored_idx = 0
        stored_cnt = stored.count
    }
    return stored[0];
}
Peter Shaw
  • 1,867
  • 1
  • 19
  • 32

1 Answers1

1

Your implementation of nextChar is terribly inefficient.

You create a String and then call characters over and over and you update that set of characters over and over.

Why not create the String and then only store a reference to its characters. And then track an index into characters. Instead of updating it over and over, simply increment the index and return the next character. No need to update the string over and over.

Once you get to the last character, read the next piece of the file. Create a new string, reset the characters and the index.

rmaddy
  • 314,917
  • 42
  • 532
  • 579
  • i have now a ```var stored : [Character] = Array()``` and return ```let r = stored[stored_idx]; stored_idx += 1; return r}``` the 1.4 is fast, but the 150mb file takes still 2.33 min. Is there even a better implementation? – Peter Shaw Oct 28 '16 at 09:46
  • here is my current class: https://gist.github.com/petershaw/51db7df73e4bf06bb935a3abb41d8b64 – Peter Shaw Oct 28 '16 at 09:49
  • I'd still use Instruments to profile the updated code. See which lines are still a bottleneck. I wouldn't be surprised if the debugging `print` call is causing a lot of the performance issues. – rmaddy Oct 28 '16 at 15:06
  • it's still the ```return stored[stored_idx]``` i suppose. I used the time profiler, right? hard to see for me. For 150mb I am at 1m50s. still very slow. I updated the gist. I think i need another idea to get the file char by char... I am a bit lost. – Peter Shaw Oct 29 '16 at 07:57