0

I have following code where the idea is to read a text file line by line and save the current position m_numBytesRead. So if I break the loop (on my case to split text parsing by chunks on big files) and I try to access a second time by making a Seek of m_numBytesRead-1, the ReadString is not geting the begin of the line as I expected.

CStdioFile fileLog;
if (fileLog.Open(m_strReadFileName, CFile::modeNoTruncate | CFile::modeRead | CFile::shareDenyNone))
{
    if (m_numBytesRead > 0)
        fileLog.CStdioFile::Seek(m_numBytesRead-1, CFile::begin);

    bool bBreakLoop = false;
    while (fileLog.ReadString(strLine) && !bBreakLoop)
    {

        // any condition to set bBreakLoop after few MB read...

        if (!bBreakLoop)
        {
            m_numBytesRead = fileLog.CStdioFile::GetPosition();
        }
    };
    fileLog.Close();
}

By debuging more in detail and comparing with the indexes I get on Notepad++, it seems that the CStdioFile::GetPosition() is not giving correct value, begining of new line to be read, but few bytes (12 on my case) more...

Is is a bug on MFC or is there something I'm missing there ? Does someone see similar issues ?

Note that I'm using VS2010 on Windows 7.

Blacktempel
  • 3,935
  • 3
  • 29
  • 53
  • Hmmm... apparently I'm not alone. And the buffer is not a solution on my case (too big files) : http://forums.codeguru.com/showthread.php?456659-CStdioFile-GetPosition-Seek – Thierry Campiche Oct 01 '15 at 16:42
  • 2
    It's not a Unicode text file with a BOM is it? Also, don't forget that `CStdioFile` has special processing for carriage return–linefeed pairs so what you see on disk may not be what you end up reading. – Roger Rowland Oct 01 '15 at 18:51
  • 4
    I suspect that the issue is with the single `0x0A' or `0x0D` line break characters; the CStdioFile might be converting them internally to a pair `0x0A0D` without adjusting the position / counter. – Vlad Feinstein Oct 01 '15 at 19:04
  • Thanks for your hints. To answer your questions : - is not a Unicode text file on my case - line endings is always CR + LF By investing a little more, it seems that GetPosition() is the base class one (CFile) and represent the "input buffer binary", which seems to read a little more chars than needed (by CStdioFile) to search for line endings. This is just a hint, I have no proof of that. So I had to use another method to save the position... – Thierry Campiche Oct 26 '15 at 10:23
  • Thanks Vlad Feinstein, was the LF only. I was using CFile instead of CStdioFile to write the file. – Thierry Campiche Oct 27 '15 at 09:34

1 Answers1

3

Add open mode CFile::typeBinary to get byte-exact offsets. The default mode is text, which performs newline conversion which may mess up offsets.

Tino Didriksen
  • 2,215
  • 18
  • 21