0

Hi stackoverflow(ers)!

I'm learning Unix using the XV6 OS (documentation found here) and have been trying to write a tail function in C. The expected output of:

  1. tail is to give the last 10 lines of the file
  2. tail - is to give the last of lines of the file
  3. tail ... is to give the last 10 lines of files ...
  4. tail - ... is to give the last of lines of ...
  5. grep | tail is to give the last 10 sentences in which contain

I have written two versions of tail, one implemented using char* [] and the other by writing to a file and then reading from it (both posted below) My version which implements the tail using char* [] seems to be more accurate to the actual command. However in the version where I'm writing to a temporary file and then reading from it I'm getting more lines as output and I'm not sure why that is happening. My guess is, while reading from one file and writing to another the placement of '\n' are getting messed up. I'd highly appreciate help in figuring it out!

Please don't get mad at me if I'm doing something silly. I'm new to C in Unix and only trying to learn.

tail.c using char* []

#include "types.h"
#include "stat.h"
#include "user.h"
#include "fcntl.h"

char buf [512];

void tail (int fd, int toSub) {
  int n;
  int numLines = 0;
  int linesToPrint = 0;
  char *buffer;
  buffer = (char*) malloc (500000);
  int buffSize = 0;
  while ((n = read(fd, buf, sizeof(buf))) > 0) {
    for (int i = 0; i<n; i++) {
      buffer[buffSize] = (char)buf[i];
      buffSize++;
      if(buf[i] == '\n')
        numLines++;
    }
  }
  if (n < 0) {
    printf (1, "tail: read error \n");
    exit ();
  }
  if (numLines < toSub)
    linesToPrint = 0;
  linesToPrint = numLines - toSub;

  int counter = 0;
  for (int i = 0; i < buffSize; i++) {
    if (counter >= linesToPrint)
      printf(1,"%c",buffer[i]);
    if (buffer[i] == '\n')
      counter++;
  }
  free (buffer);
}

int main (int argc, char *argv[]) {
  int toSub = 10;
  int fd = -1;

  if (argc <= 1) {
    tail (0, toSub);
    exit();
  }
  else if (argc > 1 && argv[1][0] == '-') {
    char getToSub [10];
    for (int k=1; k<strlen(argv[1]); k++) {
      getToSub[k-1] = argv[1][k];
    }
    toSub = (atoi)(getToSub);
  }
  else {
    if((fd = open (argv[1], toSub)) < 0) {
      printf (1, "tail: cannot open %s\n", argv[1]);
      exit ();
    }
    tail (fd, toSub);
    close (fd);
  }
  if (argc > 2) {
    for (int i=2; i<argc; i++) {
      if((fd = open (argv[i], 0)) < 0) {
        printf (1, "tail: cannot open %s\n", argv[i]);
        exit ();
      }
      else {
        tail (fd, toSub);
        close (fd);
      }
    }
  }
  exit();
}

tail.c using write

#include "types.h"
#include "stat.h"
#include "user.h"
#include "fcntl.h"

char buf [512];

void tail (int fd, int toSub) {
  int n;
  int numLines;
  int linesToPrint;
  int ptrDump;
  ptrDump = open ("tailDump", O_CREATE | O_RDWR);
  while ((n = read(fd, buf, sizeof(buf))) > 0) {
    write (ptrDump, buf, sizeof(buf));
    for (int i = 0; i<n; i++) {
      if(buf[i] == '\n')
        numLines++;
    }
  }
  if (n < 0) {
    printf (1, "tail: read error \n");
    exit ();
  }
  if (numLines < toSub)
    linesToPrint = 0;
  linesToPrint = numLines - toSub;

  close (ptrDump);
  ptrDump = open ("tailDump", 0);

  int counter = 0;
  while ((n = read(ptrDump, buf, sizeof(buf))) > 0) {
    for (int i = 0; i<n; i++) {
      if (counter > linesToPrint)
        printf(1,"%c",buf[i]);
      if (buf[i] == '\n')
        counter++;
      }
    }
    close (ptrDump);
    unlink("tailDump");
}

int main (int argc, char *argv[]) {
  int toSub = 10;
  int fd = -1;

  if (argc <= 1) {
    tail (0, toSub);
    exit();
  }
  else if (argc > 1 && argv[1][0] == '-') {
    char getToSub [10];
    for (int k=1; k<strlen(argv[1]); k++) {
      getToSub[k-1] = argv[1][k];
    }
    toSub = (atoi)(getToSub);
  }
  else {
    if((fd = open (argv[1], toSub)) < 0) {
      printf (1, "tail: cannot open %s\n", argv[1]);
      exit ();
    }
    tail (fd, toSub);
    close (fd);
  }
  if (argc > 2) {
    for (int i=2; i<argc; i++) {
      if((fd = open (argv[i], 0)) < 0) {
        printf (1, "tail: cannot open %s\n", argv[i]);
        exit ();
      }
      else {
        tail (fd, toSub);
        close (fd);
      }
    }
  }
  exit();
}

I have the code put up on my Github (found here) as well in tail_using_str.c and tail_using_file.c

2 Answers2

2

I think your problem is here:

  while ((n = read(fd, buf, sizeof(buf))) > 0) {
    write (ptrDump, buf, sizeof(buf));

You read in n bytes but when you write, you write sizeof(buf) bytes. In other words, you may write too many bytes.

Maybe you want this instead:

  while ((n = read(fd, buf, sizeof(buf))) > 0) {
    write (ptrDump, buf, n);
                         ^
                        note
Support Ukraine
  • 42,271
  • 4
  • 38
  • 63
  • That fixed it!! thanks you so much! Now that I think about it, makes so much sense! I've updated the code and given you the much deserved credit! (https://github.com/AdityaSingh/XV6/tree/Tail) – Aditya Singh Feb 06 '17 at 09:51
0

Please don't get mad at me if I'm doing something silly. I'm new to C in Unix and only trying to learn.

Thus this answer, which is not strictly necessary, since the core question you've asked has already been answered. Your posted question actually raises a bunch more questions not explicitly asked, which I intend to answer here.

The expected output of: ... tail - is to give the last of lines of the file

According to who? Not according to POSIX, and not according to UNIX V7, where tail(1) first appeared.

(Well, actually tail(1) first appeared in PWB/UNIX, but that wasn't widely used.)

grep | tail is to give the last 10 sentences in which contain

You mean last 10 lines, not sentences. grep does not produce sentences.

(Except in Soviet Unix, where grep sentences you!)

char *buffer;

buffer = (char*) malloc (500000);

This and the following exit call create a memory leak. You may say that it's harmless since the OS will give the memory back on program exit, but it's sloppy, and tools like Valgrind will call you on it.

Either free() your buffers before all possible exit points from the function, or declare this buffer on the stack instead:

char buffer[500000]

You might not be able to declare a buffer that big on the stack, depending on xv6's limits. A common modern limit for the stack size is 2 MiB, and that's for the entire stack, used by all of the functions in your deepest call chain. This is configurable is modern systems, but may not be configurable in xv6.

If you're forced to go with the malloc() option, you can do that on a single line:

char *buffer = (char*) malloc (500000);

Additionally:

  • it is bad style to have buf and buffer. Lazy. Give each buffer a purpose-driven name, like lineBuf and accumBuf

  • buffSize is confusingly named. It isn't clear which buffer it refers to, and it isn't the size of the buffer anyway. Call it something like accumBytes to solve both problems.

  • You're missing a bunch of #includes necessary on modern POSIX systems, and you have some that don't work on such. I'd see if xv6 has stdio.h.h, stdlib.h, string.h and unistd.h, and #include them for POSIX portability. I'd also see if you can #include types.h via sys/types.h, as that's necessary at least on macOS, and probably other Unixes. user.h isn't needed on modern systems, so if you don't actually need it on xv6, remove it.

  • Your in-memory variant reads the entire file into RAM and then skips back over the bytes in RAM it doesn't want to print. A bit of thought will show how you can both cut the buffer size down and not make two passes over the input data. (Hint: accumBuf[toSub][sizeof(lineBuf)]. Feel free to multiply the second term by some amount if you wish to allow lines greater than sizeof(lineBuf) bytes.)

if(buf[i] == '\n') numLines++;

You should probably check for a non-'\n' byte at the end of the accumulation buffer and add another line for it. Lines without LF terminators aren't quite kosher, but the user expectation is typically that you treat that trailing fragment as a line.

printf (1, "tail: read error \n");

What is this 1, noise? Are you trying to specify stdout? That's only correct for write, not printf. printf() already sends to stdout. (Indeed, you have to use fprintf() to send anywhere else.)

Since these are only in your error cases, that means you must not be testing for errors.

That's another reason to write code for POSIX portability even though you're ultimately targeting xv6: modern Unix system C compilers are much stricter about the code they're willing to accept. Modern C compilers do much of what we had to rely on tools like lint for in the past.

exit()

exit(2) takes a parameter, the exit status code, traditionally 0 for a clean exit and nonzero for an error. The only reason your compiler is letting you get away with that is that early C compilers did not strictly check the argument list given against the function's declared parameters. In fact, xv6 is probably shipping a K&R compiler which didn't even have function prototypes to declare the parameter lists with. The programmer was expected to do the right thing without being warned.

linesToPrint = numLines - toSub;

That isn't "lines to print", it's "lines to skip printing". It took me a good 5 minutes of staring at the code to get past that semantic mismatch. The compiler doesn't care, but variable names aren't for the compiler. If they were only for the compiler, we'd just call them all a, b, etc.

printf("%c",buffer[i]);

Use putchar() here.

int counter = 0;

Again, lazy. Count of what?

I'm only halfway through the first program, but that's enough commentary. I hope you've learned a few things from this.

Community
  • 1
  • 1
Warren Young
  • 40,875
  • 8
  • 85
  • 101