0

INTRODUCTION:

I am reading from text file with ReadFile. Buffer passed to ReadFile is sent to standard output with cout. Standard output is redirected to a text file.

PROBLEM:

Although my code "works", no data is lost, resulting file is larger than the original one.

When opened in notepad, everything seems fine, but when opened in Notepad++ I can clearly see extra lines added. These lines are new lines (\n).

MVCE that reproduces this behavior is submitted below.

#include <iostream>
#include <Windows.h>

int main()
{
    HANDLE hFile = ::CreateFile("C:\\123.txt", 
        GENERIC_READ,
        FILE_SHARE_READ | 
        FILE_SHARE_WRITE | 
        FILE_SHARE_DELETE,
        NULL, 
        OPEN_EXISTING, 
        FILE_ATTRIBUTE_NORMAL, 
        NULL);

    if (INVALID_HANDLE_VALUE == hFile) 
        return ::GetLastError();

    char buffer[256];
    DWORD bytesRead = 1,  // dummy value so while loop can work
        bytesWritten = 0; // needed for WriteFile, not for cout version

    //======== so WriteFile outputs to console, not needed for cout version
    HANDLE hStandardOutput = ::GetStdHandle(STD_OUTPUT_HANDLE);

    if (INVALID_HANDLE_VALUE == hStandardOutput)
    {
        std::cout << "GetStdHandle error code = " << ::GetLastError() << std::endl;
        ::CloseHandle(hFile);
        return ::GetLastError();
    }
    //============================
    while(bytesRead)
    {
        // '\0' terminate buffer, needed for cout only
        ::memset(buffer, '\0', sizeof(buffer)); 

        if (!::ReadFile(hFile, 
            buffer, 
            sizeof(buffer) - 1, // - 1 for '\0', not needed when using WriteFile
            &bytesRead, NULL))
        {
            std::cout << "ReadFile error code = " << ::GetLastError() << std::endl;
            break;
        }
        /*============= Works fine
        if(!::WriteFile(hStandardOutput, buffer, bytesRead, &bytesWritten, NULL))
        {
            std::cout << "WriteFile error code = " << ::GetLastError() << std::endl;
            break;
        }*/
        //------------- comment out when testing WriteFile 
        std::cout << buffer;  // extra lines...
        // std::cout.write(buffer, bytesRead); // extra lines as well...
        //----------------------------------------
    }
    ::CloseHandle(hFile);
    return 0;
}

QUESTION:

What is causing above described behavior? How to fix it?

MY EFFORTS TO SOLVE THE PROBLEM:

As I type this post I am Googling aimlessly, hoping for some clue to show up.

I suspect that the problem lies when outputting \n, it seems that Windows inserts \r as well, but I am not sure.

Community
  • 1
  • 1
AlwaysLearningNewStuff
  • 2,939
  • 3
  • 31
  • 84
  • this at all not related to files. only how `std::cout <<` work – RbMm Nov 04 '16 at 20:26
  • @RbMm: Can you explain, or provide a link that explains why is this happening? Thank you for your comment ( I haven't forgot for your other IO completion port answer, I am working on it parallel with this problem). – AlwaysLearningNewStuff Nov 04 '16 at 20:28
  • `cout` not designed to work with raw binary data. absolute normal that he can add extra `\r` or `\n` character. you read file as raw binary data but try write as formatted string data. not surprising that result not match. try use WriteFile and compare in this case. or determinate what you want got – RbMm Nov 04 '16 at 20:39
  • *`cout` not designed to work with raw binary data* In the MVCE I have tried to use `cout.write` but that failed too... `try use WriteFile ` yes, that works, as I have already stated in my post. Thank you for replying. – AlwaysLearningNewStuff Nov 04 '16 at 20:41
  • How much larger is the new file than the original? Are we talking a few extra bytes? Twice the size? These types of details are important. – MrEricSir Nov 05 '16 at 00:10
  • @MrEricSir: *How much larger is the new file than the original?* roughly speaking, about 10% larger. All the data is there, I have extra `\r\r\n` characters added as user Remy Lebeau said. It really depends how many `\n` I have. – AlwaysLearningNewStuff Nov 05 '16 at 08:54

1 Answers1

1

The \n character has special meaning to STL character streams. It represents a newline, which gets translated to the platform-specific line break upon output. This is discussed here:

Binary and text modes

A text stream is an ordered sequence of characters composed into lines (zero or more characters plus a terminating '\n'). Whether the last line requires a terminating '\n' is implementation-defined. Characters may have to be added, altered, or deleted on input and output to conform to the conventions for representing text in the OS (in particular, C streams on Windows OS convert \n to \r\n on output, and convert \r\n to \n on input) .

So it is likely that std::cout outputs \r\n when it is given \n, even if a preceding \r was also given, thus an input of \r\n could become \r\r\n on output. It is not standardized behavior on Windows how individual apps handle bare-CR characters. They might be ignored, or they might be treated as line breaks. In your case, it sounds like the latter.

There is no standard way to use std::cout in binary mode so \n is output as \n instead of as \r\n. However, see How to make cout behave as in binary mode? for some possible ways that you might be able to make std::cout output in binary mode on Windows, depending on your compiler and STL implementation. Or, you could try using std::cout.rdbuf() to substitute in your own std::basic_streambuf object that performs binary output to the console.

That being said, the way your code is handling the data buffer is a little off, it should look more like this instead (not accounting for the above info):

#include <iostream>
#include <Windows.h>

int main()
{
    HANDLE hFile = ::CreateFile("C:\\123.txt", 
        GENERIC_READ,
        FILE_SHARE_READ | 
        FILE_SHARE_WRITE |
        FILE_SHARE_DELETE,  // why??
        NULL, 
        OPEN_EXISTING, 
        FILE_ATTRIBUTE_NORMAL, 
        NULL);

    if (INVALID_HANDLE_VALUE == hFile) 
        return ::GetLastError();

    char buffer[256];
    DWORD bytesRead, bytesWritten, err;

    //======== so WriteFile outputs to console, not needed for cout version
    HANDLE hStandardOutput = ::GetStdHandle(STD_OUTPUT_HANDLE);

    if (INVALID_HANDLE_VALUE == hStandardOutput)
    {
        err = ::GetLastError();
        std::cout << "GetStdHandle error code = " << err << std::endl;
        ::CloseHandle(hFile);
        return err;
    }

    //============================
    do
    {
        if (!::ReadFile(hFile, buffer, sizeof(buffer), &bytesRead, NULL))
        {
            err = ::GetLastError();
            std::cout << "ReadFile error code = " << err << std::endl;
            ::CloseHandle(hFile);
            return err;
        }

        if (bytesRead == 0) // EOF reached
            break;

        /*============= Works fine
        if (!::WriteFile(hStandardOutput, buffer, bytesRead, &bytesWritten, NULL))
        {
            err = ::GetLastError();
            std::cout << "WriteFile error code = " << err << std::endl;
            ::CloseHandle(hFile);
            return err;
        }
        */

        //------------- comment out when testing WriteFile 
        std::cout.write(buffer, bytesRead);
        //----------------------------------------
    }
    while (true);

    ::CloseHandle(hFile);
    return 0;
}
Community
  • 1
  • 1
Remy Lebeau
  • 555,201
  • 31
  • 458
  • 770
  • Thanks for answering my question. It is late at night here, so I would like to go to sleep. I have managed to test your code, it displays the same behavior as the one in my OP. It seems I will have to stick with the `WriteFile`. I will "fight" for a while before settling with `WriteFile`. Again, thank you. – AlwaysLearningNewStuff Nov 04 '16 at 21:46
  • I have decided to use `WriteFile`, thank you for helping. – AlwaysLearningNewStuff Nov 05 '16 at 08:57