-1

I'm using the following code to print last N lines of one file to the other.

#include <iostream>
#include <fstream>
#include <string>

using namespace std;

void printLastNLines(const std::string& inputFileName, const std::string& outputFileName, int N);
int main()
{
    printLastNLines("test.csv", "test2.csv", 200);

}
void printLastNLines(const std::string& inputFileName, const std::string& outputFileName, int N) {
    FILE* in, * out;
    int count = 0;
    long int pos;
    char s[100];

    fopen_s(&in, inputFileName.c_str(), "rb");
    /* always check return of fopen */
    if (in == NULL) {
        perror("fopen");
        exit(EXIT_FAILURE);
    }
    fopen_s(&out, outputFileName.c_str(), "wb");
    if (out == NULL) {
        perror("fopen");
        exit(EXIT_FAILURE);
    }
    
    
    fseek(in, 0, SEEK_END);
    pos = ftell(in);
    
    while (pos) {
        pos--;
        fseek(in, pos, SEEK_SET); 
        char c = fgetc(in);
        if (c == '\n') {
            if (count++ == N) break;
        }
    }
    //fseek(in, pos, SEEK_SET);
    /* Write line by line, is faster than fputc for each char */
    while (fgets(s, sizeof(s), in) != NULL) {
        fprintf(out, "%s", s);
    }
    fclose(in);
    fclose(out);
}

The contents of the sample file test.csv is given below:

2
3

However, when I run the code, test2.csv contains the following (not that the first line is there but doesn't contain any character:

3

Can anyone guide what's wrong with the code? In general, when I give it even a bigger file, the first character of the first line is always missing.

I assumed it has something to do with the file pointer position. So, I used another fseek(in, pos, SEEK_SET); (currently commented out) and the first line with 2 started printing. However, I'm not sure why does it need this extra fseek. When I debugged the code, the last line executed is in fact fseek(in, 0, SEEK_SET);. Why do we need an extra fseek(in, 0, SEEK_SET); to make it work?

ubaabd
  • 435
  • 2
  • 13
  • `fgetc` moves the position forward. – n. m. could be an AI Jun 11 '23 at 07:19
  • *I'm using the following code to print last N lines of one file to the other.* -- There is no need to be messing around with `fseek`, or any similar functions to do this. Create a queue of N items, and read in the lines, pushing on the back a new line you read. When the queue is full, you dequeue the top item, and enqueue the next read line. Rinse and repeat for the entire file. Once done, the queue will have the N last lines of the file. This is probably 10 or 15 lines of code using `std::queue` and a little bit of logic maintaing the size of the queue to always be N items. – PaulMcKenzie Jun 11 '23 at 08:47
  • [Here is an example of using std::queue](https://godbolt.org/z/8bjfa5s6a). Not tested, but this is basically the outline I mentioned. – PaulMcKenzie Jun 11 '23 at 09:01
  • @PaulMcKenzie That's a good idea. However, I'm doing the same operation on larger files of larger size. A queue or circular buffer approach will use too much memory for this solution. That's why I am trying to use a solution that doesn't require buffers. – ubaabd Jun 11 '23 at 11:00
  • If 200 is the number of lines to keep, that is very tiny. How big are the files? Do you know approximately what the file data/size will consist of? Number of characters per line? Another solution is to seek to the end, subtract off the some amount you believe will encompass `N` lines of text, seek to that position in the file, and start the queue/buffer solution from that spot in the file. And even if your guess misses, you could start the process over again by starting some `x` lines from where you guessed. In other words, hueristically determining where to start the read from. – PaulMcKenzie Jun 11 '23 at 13:54

1 Answers1

0

The solution is basically already visible in your source code. But you commented it out: //fseek(in, pos, SEEK_SET);

The root cause is that your use getc to read a character and compare it against a newline character \n. And if you found a \n which read before, then the file pointer is one after the \n. Then you terminate the loop and the file pointer is at the wrong position.

So, you could uncomment your //fseek(in, pos, SEEK_SET); statement, but this will also be not reliable. Depending on your operating system, a new line may be marked with \r\n, so a carriage return and a line feed. This is most probably true for your system. And then the seek operation will also not do what you expect.

So, there is no easy portable solution. There is a quick fix, but it is not recommended. You may try to set the file pointer based on the know how of the line end marked.

But, it would be better, if you would use C++ for your solution.

#include <iostream>
#include <fstream>
#include <string>
#include <vector>
#include <algorithm>
using namespace std::string_literals;

void printLastNLines(const std::string& inputFileName, const std::string& outputFileName, int N) {
    // Open files and check, if they could be opened
    if (std::ifstream ifs(inputFileName); ifs) 
        if (std::ofstream ofs(outputFileName); ofs) {

            // We will read all lines into a vector
            std::vector<std::string> lines{};
            for (std::string line{}; std::getline(ifs, line); lines.push_back(line));

            // If N is greater then number of lines that we read, then limit the value
            size_t numberOfLines = N < 0 ? 0 : N;
            if (numberOfLines > lines.size()) numberOfLines = lines.size();

            // And now we write the lines to the output file
            for (size_t i = lines.size() - numberOfLines; i< lines.size(); ++i)
                ofs << lines[i] << '\n';
        }
        else std::cout << "\nError: Coud not open input file '" << inputFileName << "'\n";
    else std::cout << "\nError: Coud not open output file '" << outputFileName << "'\n";
}
int main() {
    printLastNLines("r:\\test.csv"s, "r:\\test2.csv"s, 2);
}
Gurnet
  • 118
  • 7