1

I have been trying to do this problem for at least a week now, and can't seem to understand where is the problem, I already checked everything in google, and dont know any programmer in real life to ask them personaly, so if anyone can help me it would be great.

None of the images generated load, and it doesnt recover 50 as it is suposed to, it recovers 986.

I get this results in check50:

:) recover.c exists.

:) recover.c compiles.

:) handles lack of forensic image

:( recovers 000.jpg correctly

recovered image does not match

:( recovers middle images correctly

recovered image does not match

:( recovers 049.jpg correctly

recovered image does not match

#include <stdio.h>
#include <stdlib.h>
#include <stdint.h>
#include <cs50.h>

typedef uint8_t BYTE;

#define BLOCK_SIZE 512

int main(int argc, char *argv[])
{
    //it only accepts one comand argument in the name of an image
    if (argc != 2)
    {
        printf("Usage: ./recover IMAGE");
        return 1;
    }

    //check if it can open the image
    FILE *file = fopen(argv[1], "r");

    if (file == NULL)
    {
        printf("The image cannot be opened");
        return 1;
    }

    bool jpg_before = false;
    int counter = 0;
    FILE *image = NULL;
    char name[8];
    unsigned char buffer[BLOCK_SIZE];

    //while there is still jpegs in the file
    while (fread(buffer, BLOCK_SIZE, 1, file) == 1)
    {
        if (buffer[0] == 0xff && buffer[1] == 0xd8 && buffer[2] == 0xff && (buffer[3] & 0xe0) == 0xe0)
        {
            jpg_before = true;
        }

        if(jpg_before == true)
        {
            sprintf(name, "%03i.jpg", counter);
            counter++;
            image = fopen(name, "a");
            fwrite(buffer, BLOCK_SIZE, 1, image);
            fclose(image);
        }

    }
    fclose(file);
}

(also please keep in mind I'm new to programming, 16 years old and english is not my first lenguage)

Isa M
  • 13
  • 4
  • Can you show a sample input image and the one you generate? Many CS50 image related questions here do so; so I assume that it is possible. – Yunnosch Feb 01 '22 at 19:46
  • Give `FILE *file = fopen(argv[1], "rb");` a shot. `FILE *file = fopen(argv[1], "r");` opens in text mode and could be transforming some characters in the file to make text parsing easier. The image isn't a text file, so reading it as binary is highly recommended. Should have to add the b for binary mode to `image = fopen(name, "a");` as well – user4581301 Feb 01 '22 at 19:47
  • @Yunnosch The problem is that none of the images generated load, and it doesnt recover 50 as it is suposed to, it recovers 986 – Isa M Feb 01 '22 at 19:56
  • @user4581301 it doesnt work either, thank you though – Isa M Feb 01 '22 at 19:58
  • Interesting. That is the kind of information you should really provide in the question. Did you take the [tour] and read [ask]? (I can see that you at least scrolled through the tour, but much of the information there seems to have missed you...) – Yunnosch Feb 01 '22 at 19:58
  • Is the input a loadable image? Is is still loadable if you echo it to output (i.e. without any modifications? – Yunnosch Feb 01 '22 at 19:59
  • @IsaM you'll need to read the files as binary anyway, so don't discard the change just yet. There is some other mistake in there. For example,can you be absolutely certain the new JPG pattern you are looking for will always be at the beginning of a buffer? I can easily see cases where an embedded jpg will be in the middle of a buffer or straddle two buffers unless there are additional guarantees you are not providing. I suspect you are expected to read and parse the jpg header so that you know how long the file is so you have a better chance of finding the next file in a predictable location – user4581301 Feb 01 '22 at 20:09

1 Answers1

0
  1. When you detect a header in the input, you set jpg_before. But, you never clear it.
  2. Once the flag is set, each block will be put into a different file.
  3. Every output file should consist of a header, followed by the associated data blocks.
  4. name[8] is a bit too small. The compiler will complain because the int could [in theory] be 10 or so digits, so the sprintf could overflow. Don't be stingy--use (e.g.): char name[20];
  5. Output file should be opened with "w" instead of "a". If the program is run twice, the second time, the output file(s) will be incorrect.

Here is the refactored code:

#include <stdio.h>
#include <stdlib.h>
#include <stdint.h>
//#include <cs50.h>

typedef uint8_t BYTE;

#define BLOCK_SIZE 512

int
main(int argc, char *argv[])
{
    // it only accepts one comand argument in the name of an image
    if (argc != 2) {
        printf("Usage: ./recover IMAGE");
        return 1;
    }

    // check if it can open the image
    FILE *file = fopen(argv[1], "r");

    if (file == NULL) {
        printf("The image cannot be opened");
        return 1;
    }

    int counter = 0;
    FILE *image = NULL;
    char name[20];
    unsigned char buffer[BLOCK_SIZE];

    // while there is still jpegs in the file
    while (fread(buffer, BLOCK_SIZE, 1, file) == 1) {
        if (buffer[0] == 0xff && buffer[1] == 0xd8 && buffer[2] == 0xff && (buffer[3] & 0xe0) == 0xe0) {
            if (image != NULL)
                fclose(image);

            sprintf(name, "%03i.jpg", counter);
            counter++;

            image = fopen(name, "w");
        }

        fwrite(buffer, BLOCK_SIZE, 1, image);
    }

    if (image != NULL)
        fclose(image);

    fclose(file);
}

UPDATE:

From the comments below:

Points 2 and 3 look to be handled by the file being opened for appending. Leaving the file open is probably a better idea, though. Faster and handles point 6. – user4581301

If H is header and D is data, for an input of (e.g): H1,D1,D2,D3,D4,H2,D5,D6,D7:

Instead of two output files: F0:H1,D1,D2,D3,D4 and F1:H2,D5,D6,D7

We'd have: F0:H1, F1:D1, F2:D2, F3:D3, F4:D4, F5:H2, F6:D5, F7:D6, F8:D7

Although my refactored code was correct, the top section of my answer had an incorrect analysis of what OP's code was actually doing.

I've fixed that. But, to make user4581301's make sense, here is the original analysis:

  1. When you detect a header in the input, you set jpg_before. But, you never clear it.
  2. You only write to the output stream for the header block, so any data is not copied. So, each output file will only be 512 bytes
  3. You immediately close the output stream after writing the header. It should be left open.
  4. Every block must go to a given output file, not just the header.
  5. name[8] is a bit too small. The compiler will complain because the int could [in theory] be 10 or so digits, so the sprintf could overflow. Don't be stingy--use (e.g.): char name[20];
  6. Output file should be opened with "w" instead of "a". If the program is run twice, the second time, the output file(s) will be incorrect.

UPDATE #2:

First of all thanks! But it is giving me a segmentation fault, do you have any idea why? because everything seems correct – Isa M

From code inspection, the only place that could segfault is the fwrite call (i.e. image is NULL).

I confirmed this by running the program under gdb [I have the cs50 recover input file]. When the program faults, just do tb to get a stack traceback.

image could be NULL for the following reasons:

  1. The fopen for output file could fail (due to permissions, space, etc.) and return NULL. There was no check after the call as there was for opening the input file.

  2. image starts out being NULL. If there is some sort of extra file data/file header before the first jpg header (e.g. before FF/D8/FF/E0) the if will not match on the first block read. The fwrite will be called even with a NULL in image.

Option (2) is what actually occurred because cs50's file has an extra header at the top of the file. You can see this by examining the file with a hex editor/dumper (e.g.) od or xxd:

00000000: 00000000 00000000 00000000 00000000  ................
*
00000200: 63733530 2E6C792F 73757270 72697365  cs50.ly/surprise
00000210: 00000000 00000000 00000000 00000000  ................
*
00000400: FFD8FFE0 00104A46 49460001 01000001  ......JFIF......

The code will not see a valid header (i.e. the if matches) until offset 400. So, there are two extraneous fread calls at the start until things sync up.

The fix is to change:

fwrite(buffer, BLOCK_SIZE, 1, image);

Into:

if (image != NULL)
    fwrite(buffer, BLOCK_SIZE, 1, image);

I've written a few answers on this problem before. However, I forgot to include this. I just wrote the code but did not test it ;-)

To round things out, I've added more return code checking and added "rb" and "wb" to the fopen calls, just in case you're running on Windoze.

Here is the updated/fixed code (I've tested it this time ;-):

#include <stdio.h>
#include <stdlib.h>
#include <stdint.h>
#include <string.h>
#include <errno.h>
//#include <cs50.h>

typedef uint8_t BYTE;

#define BLOCK_SIZE 512

void
onerr(const char *action,const char *file)
{

    printf("%s -- %s -- %s\n",action,file,strerror(errno));
    exit(1);
}

int
main(int argc, char *argv[])
{
    // it only accepts one comand argument in the name of an image
    if (argc != 2) {
        printf("Usage: ./recover IMAGE");
        return 1;
    }

    // check if it can open the image
    FILE *file = fopen(argv[1], "rb");
    if (file == NULL)
        onerr("The image cannot be opened",argv[1]);

    int counter = 0;
    FILE *image = NULL;
    char name[20];
    unsigned char buffer[BLOCK_SIZE];

    // while there is still jpegs in the file
    while (fread(buffer, BLOCK_SIZE, 1, file) == 1) {
        if (buffer[0] == 0xff &&
            buffer[1] == 0xd8 &&
            buffer[2] == 0xff &&
            (buffer[3] & 0xe0) == 0xe0) {
            if (image != NULL)
                fclose(image);

            sprintf(name, "%03i.jpg", counter);
            counter++;

            image = fopen(name, "wb");
            if (image == NULL)
                onerr("unable to open output file",name);
        }

#if 0
        fwrite(buffer, BLOCK_SIZE, 1, image);
#else
        if (image != NULL)
            fwrite(buffer, BLOCK_SIZE, 1, image);
#endif
    }

    if (image != NULL)
        fclose(image);

    fclose(file);

    return 0;
}
Craig Estey
  • 30,627
  • 4
  • 24
  • 48
  • Points 2 and 3 look to be handled by the file being opened for appending. Leaving the file open is probably a better idea, though. Faster and handles point 6. – user4581301 Feb 01 '22 at 20:10
  • @user4581301 No, _only_ 512 bytes is written to any given output file. If `H` is header and `D` is data, for an input of (e.g): `H1,D1,D2,D3,D4,H2,D5,D6,D7`: Instead of _two_ output files: `F0:H1,D1,D2,D3,D4` and `F1:H2,D5,D6,D7`, we'd have: `F0:H1`, `F1:D1`, `F2:D2`, `F3:D3`, `F4:D4`, `F5:H2`, `F6:D5`, `F7:D6`, `F8:D7`, – Craig Estey Feb 01 '22 at 20:25
  • I see your point now. `counter++;` changes the file name on every iteration, resulting in a spray of different files. – user4581301 Feb 01 '22 at 20:31
  • First of all thanks! But it is giving me a segmentation fault, do you have any idea why? because everything seems correct – Isa M Feb 01 '22 at 23:21