0

I have to do a rle algorithm in c with the escape character (Q)

example if i have an input like: AAAAAAABBBCCCDDDDDDEFG
the output have to be: QA7BBBCCCQD6FFG

this is the code that i made:

#include <stdio.h>
#include <stdlib.h>

void main()
{ 
    FILE *source = fopen("Test.txt", "r");
    FILE *destination = fopen("Dest.txt", "w");
    char carCorrente; //in english: currentChar
    char carSucc;     // in english: nextChar
    int count = 1;

    while(fread(&carCorrente, sizeof(char),1, source) != 0) {
        if (fread(&carCorrente, sizeof(char),1, source) == 0){
            if(count<=3){
                for(int i=0;i<count;i++){
                    fprintf(destination,"%c",carCorrente);
                }
            }
            else {
                    fwrite("Q",sizeof(char),1,destination);
                    fprintf(destination,"%c",carCorrente);
                    fprintf(destination,"%d",count);
                }
            break;
        }
        else fseek(source,-1*sizeof(char), SEEK_CUR);

        while (fread(&carSucc, sizeof(char), 1, source) != 0) {
            if (carCorrente ==  carSucc) {
                count++;
            } 
            else {
                if(count<=3){
                    for(int i=0;i<count;i++){
                        fprintf(destination,"%c",carCorrente);
                    }
                }
                else {
                    fwrite("Q",sizeof(char),1,destination);
                    fprintf(destination,"%c",carCorrente);
                    fprintf(destination,"%d",count);
                }

                count = 1;
                goto OUT;
            }
        }

OUT:fseek(source,-1*sizeof(char), SEEK_CUR); //exit 2° while
    }
}

the problem is when i have an input like this: ABBBCCCDDDDDEFGD
in this case the output is: QB4CCCQD5FFDD
and i don't know why :(

Giovanni Far
  • 1,623
  • 6
  • 23
  • 37
  • You know that `fread` and other reading functions for files advance the reading position in the file, don't you? So when you just check for 0 without storing the result, the A gets eaten. Also, please consider using `c = getc(f)` instead of `fread`, which is better suited to longer blocks of data. – M Oehm Nov 24 '13 at 15:41
  • yes i know for this reason i did:
    fseek(source,-1*sizeof(char), SEEK_CUR);
    – Giovanni Far Nov 24 '13 at 15:43
  • if i use getc how can i go back with the pointer in the file ?? – Giovanni Far Nov 24 '13 at 15:45
  • Yes you can. From a technical point, there's nothing wrong with your `fread`, but it is needlessly long. Whether you can change the position in the file with `fseek` depends on the type of the file, not on the functions you use to access them. If the file is on disk, you usually can go back. And `fseek` returns `EOF` if it can't do what you tell it to do. – M Oehm Nov 24 '13 at 15:55
  • Oh, I have given you bad advice: If you want to use `getc`, please make `carCorente` an `int`. Otherwise you cannot detect the end of the file. I'm sorry for that. – M Oehm Nov 24 '13 at 15:57

4 Answers4

2

There is no need to use Fseek to rewind as u have done , Here is a code that is have written without using it by using simple counter & current sequence character.

C implementation:

#include<stdio.h>
#include<stdlib.h>

void main()
{ 
    FILE *source = fopen("Test.txt", "r");
    FILE *destination = fopen("Dest.txt", "w");
    char currentChar;
    char seqChar;
    int count = 0;

    while(1) {
      int flag = (fread(&currentChar, sizeof(char),1, source) == 0); 

      if(flag||seqChar!=currentChar) {

         if(count>3) {
           char ch = 'Q';
           int k = count;
           char str[100];
           int digits = sprintf(str,"%d",count);
           fwrite(&ch,sizeof(ch),1,destination);
           fwrite(&seqChar,sizeof(ch),1,destination);
           fwrite(&str,sizeof(char)*digits,1,destination);
         }
         else {
           for(int i=0;i<count;i++) 
              fwrite(&seqChar,sizeof(char),1,destination);
         }
         seqChar = currentChar;
         count =1;
      }

     else count++;

     if(flag)
       break;
    }

   fclose(source);
   fclose(destination);
}
Vikram Bhat
  • 6,106
  • 3
  • 20
  • 19
  • @MOehm Didnt implement that because he has not given specification for that but it is a minor change to code using integer to string – Vikram Bhat Nov 24 '13 at 16:43
  • Okay, but there wasn't a spec that said count is less than 10 either, or was there? Anywy, thanks for updating. In a scenario where Q is an escape character he might even get away with ignoring counts of more than 10. – M Oehm Nov 24 '13 at 17:08
  • By the way, `sizeof(char)` is always 1 by definition; I would either put 1 in its place or use `sizeof(variable)` (e.g., `fwrite(str,sizeof(*str),digits,destination`). – Arkku Nov 24 '13 at 22:50
  • @Arkku yes u can remove sizeof(char) from there as well , i just wrote it to give better understanding of what i my doing to the reader. – Vikram Bhat Nov 25 '13 at 04:48
1

Your code has various problems. First, I'm not sure whether you should read straight from the file. In your case, it might be better to read the source string to a text buffer first with fgets and then do the encoding. (I think in your assignment, you should only encode letters. If source is a regular text file, it will have at least one newline.)

But let's assume that you need to read straight from the disk: You don't have to go backwards. You already habe two variables for the current and the next char. Read the next char from disk once. Before reading further "next chars", assign the :

int carSucc, carCorr;             // should be ints for getc

carSucc = getc(source);           // read next character once before loop 
while (carSucc != EOF) {          // test for end of input stream
    int carCorr = next;           // this turn's char is last turn's "next"

    carSucc = getc(source);
    // ... encode ...
}

The going forward and backward makes the loop complicated. Besides, what happens if the second read read zero characters, i.e. has reached the end of the file? Then you backtrace once and go into the second loop. That doesn't look as if it was intended.

Try to go only forward, and use the loop above as base for your encoding.

M Oehm
  • 28,726
  • 3
  • 31
  • 42
  • thank you for your advice. I have to do an algorithm like win zip that use rle method with escape character. But i tought it's better begin just with a normal file so i can see how works the algorithm. if it works well after i have to work with a file, for example a png picture. but i think the logic is exactly the same. changes just the input file. no ??? I would like also to ask to you about EOF. why i have to use an integer for the variable? EOF is a number? so when i reach the end of the file which number will have carSucc? and this number is the conversion of EOF ? thx – Giovanni Far Nov 24 '13 at 16:40
  • Okay, I misunderstood your task. With Q being an odd choice for an escape character, I thought it was a "toy" problem that should treat only letters. About `int` in `getc`: It returns an integer in the range of an unsigned char, ie. 0 to 255. The special case is `EOF` which is a negative value. It indicates that you are at the end of the file. (The point is: use int to store the result of `getc`. The whole story doesn't fit into a comment. Also, even the char constants like so `'a'` are `int`s in C.) – M Oehm Nov 24 '13 at 19:33
1

I think the major problem in your approach is that it's way too complicated with multiple different places where you read input and seek around in the input. RLE can be done in one pass, there should not be a need to seek to the previous characters. One way to solve this is to change the logic into looking at the previous characters and how many times they have been repeated, instead of trying to look ahead at future characters. For instance:

int repeatCount = 0;
int previousChar = EOF;
int currentChar; // type changed to 'int' for fgetc input

while ((currentChar = fgetc(source)) != EOF) {
    if (currentChar != previousChar) {
        // print out the previous run of repeated characters
        outputRLE(previousChar, repeatCount, destination);
        // start a new run with the current character
        previousChar = currentChar;
        repeatCount = 1;
    } else {
        // same character repeated
        ++repeatCount;
    }
}
// output the final run of characters at end of input
outputRLE(previousChar, repeatCount, destination);

Then you can just implement outputRLE to do the output to print out a run of the character c repeated count times (note that count can be 0); here's the function declaration:

void outputRLE(const int c, const int count, FILE * const destination)

You can do it pretty much the same way as in your current code, although it can be simplified greatly by combining the fwrite and two fprintfs to a single fprintf. Also, you might want to think what happens if the escape character 'Q' appears in the input, or if there is a run of 10 or more repeated characters. Deal with those cases in outputRLE.


An unrelated problem in your code is that the return type of main should be int, not void.

Arkku
  • 41,011
  • 10
  • 62
  • 84
0

Thank you so much, i fixed my algorithm. The problem was a variable, in the first if after the while. Before

if (fread(&carCorrente, sizeof(char),1, source) == 0)

now

if (fread(&carSucc, sizeof(char),1, source) == 0){

for sure all my algorithm is wild. I mean it is too much slow!
i made a test with my version and with the version of Vikram Bhat and i saw how much my algorithm losts time.
For sure with getc() i can save more time.

now i'm thinking about the encoding (decompression) and i can see a little problem.

example:
if i have an input like: QA7QQBQ33TQQ10QQQ
how can i recognize which is the escape character ???

thanks

Giovanni Far
  • 1,623
  • 6
  • 23
  • 37