1

I have a question about what a QTextStream is calculating with the pos() method. I assumed it was the number of bytes, but it seems that this might not be the case.

The reason I ask, is that I am processing rows in a file, and once the number of rows read reached some arbitrary number or stream.atEnd() is true, I break out of the loop and save stream.pos() to a qint64* variable. Once the processing is complete, I go back to the file and seek(*filePosition) to get back to my last position and grab more data until stream.atEnd() is true. This works in the sense that can keep track of where I am, but it is very slow calling stream.pos() as is noted in the Qt docs.

What I am attempting is to update the file position after each line is read in an efficient manner. However, it is not working and when the program goes back to read the file again, the position is not correct as the first line it reads starts in the middle of line previously read on the last iteration.

Here is what is have so far:

QTextStream stream(this);
stream.seek(*filePosition);
int ROW_COUNT = 1;
while (!stream.atEnd()) {
    QString row = stream.readLine();
    QStringList rowData = row.split(QRegExp(delimiter));
    *filePosition += row.toUtf8().size();
    /*
    processing rowData...
    */ 
    if (ROW_COUNT == ROW_UPLOAD_LIMIT) {
        break;
    }
    ROW_COUNT++;
}
/*
close files, flush stream, more processing, etc...
*/

2 Answers2

2

QTextStream::pos returns position in bytes. I see the following problems:

  1. You are not accounting for the line ending character (or 2 characters)
  2. In UTF-8, a single character might take more than 1 byte

Also, why save buffer position after reading each line? This might be faster:

if (ROW_COUNT == ROW_UPLOAD_LIMIT) {
    *filePosition = stream.pos();
    break;
}
Georgy Pashkov
  • 1,285
  • 7
  • 11
  • 1
    That is what I had originally - albeit I set the position once outside the while loop. Same result... – rgrwatson85 Feb 18 '16 at 21:19
  • If your file contains only ascii characters, then accounting for line endings should be sufficient. – Georgy Pashkov Feb 18 '16 at 21:35
  • Thanks for the help, but I figured it out. I create the stream outside the scope of the function and pass it in a pointer to it. This keeps the stream in scope until the entire file gets processed rather than creating a new stream for each iteration of the file – rgrwatson85 Feb 18 '16 at 21:46
  • 1
    Also, I tried to +1 this answer, but I don't have enough reputation points. – rgrwatson85 Feb 18 '16 at 22:06
0

The solution was to create the QTextStream outside of the function and pass it in as a parameter. Doing this allows me to not have to worry about tracking the position on each iteration because I keep the stream in scope until I have completely finished processing the file.

class FeedProcessor {
    ...
    void processFeedFile() {
        IntegrationFile file("big_file.txt");
        file.open(QIODevice::ReadOnly | QIODevice::Text);
        QTextStream stream(&file);

        while(!stream.atEnd()) {
           file.extractFileContents(&stream);
           /*
           do more processing with file
           */
        }
    }
    ...
}

class IntegrationFile : public QFile {
    ...
    void extractFileContents(QTextStream* stream) {
        int ROW_COUNT = 1;
        while (!stream.atEnd()) {
            QString row = stream.readLine();
            QStringList rowData = row.split(QRegExp(delimiter));
            /*
             processing rowData...
             */ 
             if (ROW_COUNT == ROW_UPLOAD_LIMIT) {
                break;
             }
             ROW_COUNT++;
         }
         /*
         close files, flush stream, more processing, etc...
         */
    }
    ...
}