0

I am writing an academic project in C and I can use only <fcntl.h> and <unistd.h> libraries to file operations.

I have the function to read file line by line. The algorithm is:

  1. Set pointer at the beginning of the file and get current position.
  2. Read data to the buffer (char buf[100]) with constant size, iterate character by character and detect end of line '\n'.
  3. Increment current position: curr_pos = curr_pos + length_of_read_line;
  4. Set pointer to current position using lseek(fd, current_position, SEEK_SET);

SEEK_SET - set pointer to given offset from the beginning of the file. In my pseudo code current_position is the offset.

And actually it works fine, but I always move the pointer starting at the beginning of the file - I use SEEK_SET - it isn't optimized.

lseek accept also argument SEEK_CUR - it's a current position. How can I move back pointer from current position of pointer (SEEK_CUR). I tried to set negative offset, but didn't work.

gsamaras
  • 71,951
  • 46
  • 188
  • 305
user
  • 4,410
  • 16
  • 57
  • 83
  • 1
    Why seek at all? If you read a part of a file, the file pointer points to the next position to read. – glglgl Nov 26 '18 at 20:38
  • 1
    Please edit your post and include your full code in a code block. We can't tell what your error might be without it. – Craig Estey Nov 26 '18 at 20:39
  • Yeah, but I have read 100 bytes and end of line could be at 65 position. After read each line I have to set pointer to the beginning of the file. – user Nov 26 '18 at 20:39
  • 1
    *... but didn't work.* That is not a useful problem description. Not at all. Post a complete example of code that demonstrates your problem, what it does that you think is wrong, and what you think it should do instead. – Andrew Henle Nov 26 '18 at 20:44
  • What is wrong with `SEEK_SET` ? – Xaqq Nov 26 '18 at 20:46
  • How do you know that SEEK_SET "isn't optimized" ? What exactly do you mean by that? – Support Ukraine Nov 26 '18 at 20:47
  • Not optimized means that when `lseek` use SEEK_SET always has to iterate over the file byte by byte. SEEK_CUR set pointer to offset from current position of pointer. – user Nov 26 '18 at 20:55
  • If you are reading in chunk sizes of 100, you might as well stop what you're doing and just use `fgets`. Maybe it's more useful to use `read` with block sizes of 4096 or 8192 or PIPE_BUF, but reading in chunks of 100 is never going to be more efficient than just using `fread` or `fgets`. – William Pursell Nov 26 '18 at 21:15
  • Do you mean that you're not storing your old file offset and every time you call this method, you reset the file pointer? ... Then don't do that. – Ben Stern Nov 27 '18 at 04:59

2 Answers2

1

The most efficient way to read lines of data from a file is typically to read a large chunk of data that may span multiple lines, process lines of data from the chunk until one reaches the end, move any partial line from the end of the buffer to the start, and then read another chunk of data. Depending upon the target system and task to be performed, it may be better to read enough to fill whatever space remains after the partial line, or it may be better to always read a power-of-two number of bytes and make the buffer large enough to accommodate a chunk that size plus a maximum-length partial line (left over from the previous read). The one difficulty with this approach is that all data to be read from the stream using the same buffer. In cases where that is practical, however, it will often allow better performance than using many separate calls to fread, and may be nicer than using fgets.

While it should be possible for a standard-library function to facilitate line input, the design of fgets is rather needlessly hostile since it provides no convenient indication of how much data it has read. After reading each line, code that wants a string containing the printable portion will have to use strlen to try to ascertain how much data was read (hopefully the input won't contain any zero bytes) and then check the byte before the trailing zero to see if it's a newline. Not impossible, but awkward at the very least. If the fread-and-buffer approach will satisfy an application's needs, it's likely to be at least as efficient as using fgets, if not moreso, and since the effort required to use fgets() robustly will be comparable to that required to use a buffering approach, one may as well use the latter.

supercat
  • 77,689
  • 9
  • 166
  • 211
1

Since your question is tagged as , I would go with getline(), without having to manually take care of moving the file pointer.

Example:

#include <stdio.h>
#include <stdlib.h>

int main(void)
{
    FILE* fp;
    char* line = NULL;
    size_t len = 0;
    ssize_t read;

    fp = fopen("input.txt", "r");
    if(fp == NULL)
        return -1;

    while((read = getline(&line, &len, fp)) != -1) 
    {
        printf("Read line of length %zu:\n", read);
        printf("%s", line);
    }

    fclose(fp);
    if(line)
        free(line);
    return 0;
}

Output with custom input:

Read line of length 11:
first line
Read line of length 12:
second line
Read line of length 11:
third line
gsamaras
  • 71,951
  • 46
  • 188
  • 305