0

About the program: Hello, I am writing a simple program to extract content from a .txt file and convert that content into a .csv file. The plan is to look for specific words within that .txt file. This is really just to experiment with the functions open(), read() , write() and close() in C on linux.

The Problem: On line 34 of the code, I try to store each character coming in to form a word. After extracting a " " from the .txt, it will clear the word buffer. Problem is, I get a segmentation fault (core dump). I am not sure how to fix This problem. I tried using GDB to debug and find the seg fault at line 34.

Thank you in advance

The Code

/* 
Program to convert content inside a .txt file 
into a .csv file.
*/

#include <stdio.h>
#include <stdlib.h>
#include <fcntl.h>      // open()
#include <unistd.h>     // For read(), write() an close()
#include <string.h>     // Used for strcmp()

int main(int argc, char **argv){

    int samp = open("sample.txt", O_RDONLY);        // This is Opening a file to work with. @param char  *filename,  @param int  access,  @param int  permission
    int csv = open("sample.csv", O_WRONLY | O_CREAT, 0600);     // Used to create a file. 

    char *word;         // Stores each word 
    char buff[1];       // Holds 1 character of the file
    int i = 0;          // Counter for word buffer

    /* read(handle (file), buffer, size (bytes)) */
    /* write(handle (file), buffer, size (bytes)) */

    while(read(samp, buff, 1) != 0){    // Loops through file, char by char 
        printf("%s", buff);             // prints current character in buff

        if(strcmp(buff," ") == 0){      // To create csv, every " " found, we add a "," 
            write(csv, ",", 1);         // If " " is found, we write a comma to csv file
            word = "";                  // Clear word buffer
        }

        else{
            write(csv, buff, 1);        // Write value of buff in csv file
            word[i] = buff[0];              // Copy each characer in buff to word
        }

        i++;
    }

    close(samp);    // Closig .txt file
    close(csv);     // Closing .csv file

    return 0;
}
Sourav Ghosh
  • 133,132
  • 16
  • 183
  • 261
Jose Ortiz
  • 311
  • 1
  • 3
  • 14

2 Answers2

0

The problem is with

 printf("%s", buff);

buff is not a string. You can either

  • define buff as two element array, char buff[2] = {0}; and then use buff as a string.
  • define buff as a single char (not an array), pass &buff to read() call and use %c format specifier to print buff.
  • use %c and pass buff[0].

To elaborate, %s format specifier expects an argument as a pointer to a null-terminated char array. In your case, buff is one element too short to hold an input (from read()) as well as the null-terminator. So, due to the property of %s, the out of bound access happens which invokes undefined behavior.

Sourav Ghosh
  • 133,132
  • 16
  • 183
  • 261
  • @Sourav Ghosh Thank you for this catch! I will keep this in mind in the future. I have tried your solution, but the problem still extist. – Jose Ortiz Mar 01 '17 at 18:12
0

I think that one of the problems you have is that you are writing word[i] = buff[0], but word only ever points to a string constant, if anything at all ("", these are things you should not write to). You need to create a writable buffer to store the word.

I also don't see you ever resetting i to 0 when you complete a word, so it will be forever trying to append to the same place.

To address this, you could try changes like the following:

char *word; -> char word[256]; /* NOTE: arbitrary max word size here, you will need to ensure that you don't overrun that */

word = ""; -> word[i] = '\0'; i = 0; /* reset the string */

EDIT: Also, using strcmp to compare a single character is broken here as it is not a null terminated string. Instead, just do something like if(buff[0] == ' ')

NOTE: I don't see you doing anything useful with this word buffer you are trying to assemble, you can probably just chop it entirely.

Evan Teran
  • 87,561
  • 32
  • 179
  • 238