-1

I've been trying to make a program that checks if a file contains another file's. (like an anti Virus). I tried to use strstr() to do so but apparently the strstr() does not work so well.

What is the best solution to check if a file contains another file in it? Edit: I'm working on binary files.

Jens
  • 69,818
  • 15
  • 125
  • 179
  • You can use [longest common subsequence](https://en.wikipedia.org/wiki/Longest_common_subsequence_problem). – abhishek_naik May 14 '16 at 18:28
  • 1
    Your best solution is to rewrite this question as I do not understand what you are trying to achieve and also what you have tried – Ed Heal May 14 '16 at 18:28
  • 1
    Post the code. Also, fix the spelling on your message. If you loaded the files correctly strstr should work. – UDKOX May 14 '16 at 18:29
  • The code is somewhere about 150 lines of code... strstr() worked only when the file contained the other file at the beginning. @UDKOX – TheGolden Rabbit May 14 '16 at 18:32
  • 1
    File is only stream of bytes. They are very often binary, so treating them with strstr() is incorrect. – Michał Mielec May 14 '16 at 18:32
  • So what do you suggest me to do? @MichałMielec – TheGolden Rabbit May 14 '16 at 18:33
  • Debug your code. Use a debugger, or print intermediate results, to narrow down your problem, until you'll know where it is located. – user31264 May 14 '16 at 18:34
  • 1
    You have to know what kind of file you are looking for. Ex. Is it text file or binary, or if it contains some common markups etc Then read this file as bytes not string, and then according to proper standard try to parse it. – Michał Mielec May 14 '16 at 18:35
  • Also if file contains some zero bytes, strings functions would not read them completely. Zero bytes \0 terminates C strings – Michał Mielec May 14 '16 at 18:38
  • 1
    Deja vu: http://stackoverflow.com/questions/37225382/how-to-check-if-a-string-exists-in-a-binary-file – Martin James May 14 '16 at 18:47
  • 1
    The str* library functions are useless for this. I'm afraid that you are going to have to write some code. – Martin James May 14 '16 at 18:52
  • But there are functions in C also for bare bytes manipulating like memcmp() memcopy(), fread(). So don't be afraid :) – Michał Mielec May 14 '16 at 18:54

3 Answers3

1

The way to compare arbitrary sequences of bytes in C is with memcmp. This is a Standard C89 function.

Jens
  • 69,818
  • 15
  • 125
  • 179
  • Note: this will not work for large files that can not fit into memory – Nemanja Trifunovic May 14 '16 at 18:55
  • 1
    @NemanjaTrifunovic Only if one wants the whole file in memory. You can do comparisons a chunk at a time if you need to deal with overly big files. Or memory map the files. – Jens May 14 '16 at 18:56
  • `The memmem() function finds the start of the first occurrence of the substring needle of length needlelen in the memory area haystack of length haystacklen.` Is for strings, not binary data. Won't think it will work, haven't tried myself though. – UDKOX May 14 '16 at 18:58
  • @Jens: I agree. But also I think it's a bit more complicated, because you do not know the position at which one file can be in the other file, so you would need to check all possible positions which is not optimal. – Nemanja Trifunovic May 14 '16 at 18:58
  • 1
    @UDKOX No, `memmem` works for binary data according to the man page. But it is a GNU extension, not a C89 function. – Jens May 14 '16 at 19:00
  • I guess you are right. If my source is correct and `memmem` uses `memcmp` it should work. – UDKOX May 14 '16 at 19:07
0

You can implement Knuth-Morris-Pratt algorithm which is one of the best solutions for the problem you have.

It might be worth exploring and learning about other solutions to the string matching problem as well.
See: https://en.wikipedia.org/wiki/String_searching_algorithm for the list of string searching algorithms.

Resources to learn about KMP algorithm:

Note: you can use these algorithms on bytes as well.

Nemanja Trifunovic
  • 3,017
  • 1
  • 17
  • 20
0

There is the memmem function, which finds a subsequence in a data file. Compared with strstr, the memmem function does not stop at \0, which let's you find any data in any file.

Sawel
  • 929
  • 7
  • 17