1

I created a code that will parse a .txt file into a double array using C. My .txt file is formatted so that each of the points is delimited by a ",". Now I want make this code parse the same data, but from a .csv file. When I change my file type I receive a segmentation error.

Why does this occur? Am I wronging in believing that these two document types will be read in the same manner?

The main question of this post is, what is the difference when reading a .txt and a .csv?

/* 
* Calibration File Read Test
*/
#include <stdio.h>
#include <string.h>
#include <stdlib.h>

int main ()
{
    FILE *myfile = fopen ( "BarEast.txt", "r" );
    /* I want to change this file type to .csv */

    /* opening file for reading */
    if(myfile == NULL) 
    {
        printf("Error opening file");
        return(-1);
    }

    int i = 0;
    int j, k;

    char *result[361] = {0};
    char line[10];
    char *value;

    while(fgets(line, sizeof(line), myfile))
    {
        value = strtok(line, ",");
        result[i] = malloc(strlen(value) + 1);
        strcpy(result[i], value); 
        i++;
    }

    double val;
    double cal[361] = {0};

    for(k = 0; k < 361; k++)
    {
        val = atof(result[k]);
        cal[k] = val;
    }

    for(j = 0; j < 361; j++)
    {
        printf("Element[%d] = %f\n", j, cal[j]);
    }
    fclose(myfile);
    return 0;

}
derekerdmann
  • 17,696
  • 11
  • 76
  • 110
  • Just use `sscanf` to scan from string it supports some kind of Regex, read here: http://www.cplusplus.com/reference/cstdio/scanf/ – k06a Apr 07 '16 at 19:19
  • I'd suggest you compare your .txt and .csv files. I would think they should be identical in every way except for the filename extension, and if your code works for one, it should work for the other. – Logicrat Apr 07 '16 at 19:22
  • [Don't use Strcpy](http://stackoverflow.com/questions/5122882/strcpy-string-array) If you read the notes in the [man page for strcpy](http://linux.die.net/man/3/strcpy) it'll make sense why you would want to stay away from it. – James H Apr 07 '16 at 19:22
  • The problem is not changing the filename, the problem is more assuredly because your code has memory problems which are revealed by the different content of the .csv file. Use [`valgrind`](https://en.wikipedia.org/wiki/Valgrind) or a similar tool, to find them. – Schwern Apr 07 '16 at 19:29

2 Answers2

2

The problem is not the name of the file, it's that the files have different content. That different content is exposing the memory problems in your code.

Immediately my eye goes to the hard coded 361 everywhere. This assumes there will be 361 lines in the input file, and there is your segfault. It happens on line 40 (identified using valgrind) when val = atof(result[k]); walks off the result array. It is very tempting in C to hard code sizes. Don't do it, especially for input, it is a crutch you cannot rely on.

Instead the code must be adaptive to the number of fields and lines in the file. You could write your own dynamic array code using realloc, but there's plenty of C libraries which will do this for you, and much better. I reach for GLib for the basics.

Another problem is that you've only allocated 10 bytes for each line. This is very small. It means fgets is constantly walking off line if it's longer than 9 characters (which it will be). Any sort of static memory allocation when reading from input will be a problem. Using getline instead of fgets avoids the problem of how much memory to allocate per line. getline takes care of this for you. Be careful, getline reuses line, so if you're going to alter line you need to strdup it first.

/* 
* Calibration File Read Test
*/
#include <stdio.h>
#include <string.h>
#include <stdlib.h>
#include <glib.h>

int main (int argc, char **argv)
{
    /* Check we got the right number of arguments. */
    if( argc != 2 ) {
        fprintf(stderr, "Usage: %s <filename>\n", argv[0]);
        return -1;
    }

    /* Open the file */
    FILE *fp = fopen ( argv[1], "r" );
    if(fp == NULL) 
    {
        fprintf(stderr, "Error opening file %s for reading.\n", argv[1]);
        return(-1);
    }

    /* A dynamic array which will grow as needed */
    GArray *result = g_array_new(TRUE, TRUE, sizeof(char *));

    /* Read each line using getline which does the line memory allocation
       for you. No buffer overflow to worry about. */
    char *line = NULL;
    size_t linecap = 0;
    while(getline(&line, &linecap, fp) > 0) {
        /* This will only read the first cell. Exercise left for the reader. */
        char *value = strtok(line, ",");
        if( value == NULL ) {
            fprintf(stderr, "Could not parse %s\n", line);
            continue;
        }

        char *field = malloc(strlen(value) + 1);
        strcpy(field, value);

        g_array_append_val(result, field);
    }

    free(line);
    fclose(fp);

    /* Iterate through the array using result->len to know the length */
    for(int i = 0; i < result->len; i++)
    {
        printf("Element[%d] = %s\n", i, g_array_index(result, char *, i));
    }

    /* Free the array */
    g_array_free(result, TRUE);

    return 0;

}

I've stripped off the atof conversion because it's a distraction from the main problem. You can put that back if you like.

This still has the problem that it only reads the first cell of each line. I leave fixing that to you.

Schwern
  • 153,029
  • 25
  • 195
  • 336
0

Your conversion atof in this code

for(k = 0; k < 361; k++)
{
    val = atof(result[k]);
    cal[k] = val;
}

is going out of bounds of the array 'result'
You only allocate memory to elements in the result array when you have data to put in it

result[i] = malloc(strlen(value) + 1);

If less than 361 records were created you are reading from unallocated memory - hence the error.

You need to keep a record of how many results you have read in and then use that value to ensure that you remain in range as you process the result array.

There is no difference between files based on the file extension.

anita2R
  • 192
  • 1
  • 8