3

So I have chains of DNA letters (A,G,T,C) in linked list, and am supposed to read in from a file that looks like this:

I[tab]  ATT\n
I[tab]  ATC\n (etc)
L   CTA
L   CTG
V   GTA
V   GTG
F   TTT
F   TTC
..

where the single letters is what you get from the 3 a,t,g,c combination. I figured out how to start where I need to start (at the AGT), but can't formulate how to read the string and compare with the file to see what matches. This is what I have so far:

#include<stdio.h>
#include<stdlib.h>
#include<string.h>

typedef struct node{
    char seq[300];
    struct node* next;
    } NODE;


int
main(int argc, char* argv[]){

    int i, j=0;
    FILE *fin, *fout, *fop;
    char code1[300], code2[300], prot;
    NODE *current, *first, *prev;

    fin = fopen( argv[1], "r");
    fout = fopen( argv[2], "w");
    fop = fopen("codeoflife.txt", "r");

    current = first = malloc (sizeof (NODE));

    while( fscanf( fin, "%s", current -> seq) != EOF) {

        for (i = 0; i < 300; i++){
            if (current->seq[i] == 'a')
                current->seq[i] = 'A';
            else if (current->seq[i] == 't')
                current->seq[i] = 'T';
            else if(current->seq[i] == 'g')
                current->seq[i] = 'G';
            else if(current->seq[i] == 'c')
                current->seq[i] = 'C';
        }

        if ( (current -> next = malloc ( sizeof(NODE) ) ) == NULL){
            fprintf(fout, "Out of memory\nCan't add more DNA sequences\n");
            return EXIT_FAILURE;
        }
        prev = current;
        current = current -> next;

    }

    free(current)
    prev->next = NULL;

    current = first;

    while(current->next != NULL){
        for( i = 0; i < 300; i++){
            if( current->seq[i] == 'A')
                if( current->seq[i+1] == 'G')
                    if( current->seq[i+2] =='T'){
                        code1[j] = 'M';
                        while(fscanf(fop, "%c", &prot)) != EOF){

                        break;
        }
        if (i == 299)
            strcpy ( current->seq, "None");

        current = current->next;
    }

    return 0;
}
Jonathan Leffler
  • 730,956
  • 141
  • 904
  • 1,278
  • Have you considered using a simpler language like awk? it looks like a 3-line command – Foo Bah Aug 11 '11 at 02:03
  • Could you clarify what you are trying to achieve? The code is not complete (fout and fop serve no purpose), so I am unable to discern what string you are trying to read and what file you are comparing it to. – dmitrii Aug 13 '11 at 05:18

1 Answers1

0

The function fscanf() tends to create problems reading individual fields correctly since it has a sometimes strange idea of what a "%s" is.

Especially since this is a line-oriented file, use fgets() and then parse the string with sscanf, or even just looking through the characters one-at-a-time. Your code doesn't do that, so the simplified version is:

while (fgets (current->seq, sizeof (current->seq), fin))
{
    char *cp = strchr (current->seq, '\n');   // fgets usually adds \n
    if (cp)                                   // if \n present
         *cp  = '\000';                       // remove \n
    strupr (current->seq);   // make all upper case (unifies mixed case input)

    ...
}

This replaces 12 lines of your code, but without strange failures

wallyk
  • 56,922
  • 16
  • 83
  • 148
  • I'm not sure which platforms support `strupr()`, but it is not in Standard C, nor in POSIX. The loop could be rewritten as: `for (int i = 0, len = strlen(current->seq); i < len; i++) current->seq[i] = toupper(current->seq[i]);` which is reliably portable. – Jonathan Leffler Jun 03 '11 at 01:15
  • @Johnathan: That's strange, except for recent Microsoft releases, I've never had a problem using `strupr()` or `strlwr()`. Both are mentioned in the `` Linux documentation http://linux.die.net/man/3/strupr It is also supported by Borland, QNX, SAS, CodeWarrior, and in Java. Linux header files prototype it, but glibc doesn't have it. (Very odd.) – wallyk Jun 03 '11 at 02:39
  • How about using `fscanf("%c %s", current -> seq[i], current -> seq[i+3])` to read line by line? – jmlopez Jun 08 '11 at 13:40
  • @jmlopez: most implementations of `fscanf()` advance to subsequent linea if a second field does not appear on the current line. If the following lines are empty, it may advance many lines. – wallyk Jun 08 '11 at 19:11