1

I have a binary file (not a text file), about 20M in size, and I have a string which may or may not exist in that file. Normally (for a text file), I would use getline() to read the file line by line and then use find to detect it, something like:

bool found = false;
{
    std::string stringToLookFor("string to look for");
    std::ifstream ifs("myBinaryFile.bin");
    std::string line;
    while (!found && getline(ifs, line)) {
        found = (line.find(stringToLookFor, 0) != std::string::npos);
    }
    ifs.close();
}

However, I'm unsure if that's a wise thing to do for a binary file. My main concern is that the "lines" for such a file may be large. It may be that the entire 20M file contains no newlines so I may end up reading in a rather large string to search (there may well be other problems with this approach as well, hence my question).

Is this considered a viable approach or am I likely to run into problems? Is there a better way to search binary files than the normal textual line-by-line?

paxdiablo
  • 854,327
  • 234
  • 1,573
  • 1,953
  • You could iterate over the characters in the file and advance the iterator after reading enough successive characters to disambiguate the string you are looking for. This is how compilers tokenise source code. –  Nov 02 '19 at 10:10
  • Does this answer your question? [C++ searching text file for a particular string and returning the line number where that string is on](https://stackoverflow.com/questions/12463750/c-searching-text-file-for-a-particular-string-and-returning-the-line-number-wh) – JHBonarius Nov 02 '19 at 10:30
  • 1
    @JHBonarius: not really, no. I had actually looked at that one but it asks about how to search *text* files. I specifically made mention in this question the concerns I had on that. – paxdiablo Nov 02 '19 at 11:20
  • @Bookie, I could do that, I suppose. I would have to run a rolling window through the file at least as large as the string I'm looking for. I'll look in to that if no-one comes up with an easier-to-code method. – paxdiablo Nov 02 '19 at 11:21
  • 1
    20M is not that much. Why not load the entire file? – zdf Nov 02 '19 at 11:50
  • Use "sliding window" having size equal to searched string, then move all the data character by character though this window and compare the current content with the searched string. Note to open file in binary mode – 4xy Nov 02 '19 at 11:50
  • hm, now *after* answering the question, I noticed your reputation. so what's the trick? – Andriy Tylychko Nov 02 '19 at 12:53
  • @Bookie Sliding window is the abstract term means advance position, read the content, compare content with the token. Actually you suggest the same, so there's no contradiction here... – 4xy Nov 02 '19 at 12:56
  • @Andriy Tylychko yeah looks strange – 4xy Nov 02 '19 at 12:58
  • @ZDF: embedded platform, I'm wary of allocating 20M to do this. but I'll test. – paxdiablo Nov 03 '19 at 02:46
  • 1
    @AndriyTylychko, despite my reputation, I'm pretty certain there are others here who know more than me, at least in many areas. Well, I damn well *hope* so :-) – paxdiablo Nov 03 '19 at 02:47

2 Answers2

2

I'll bite the bait and try an answer. You are looking for this:

//...
std::ifstream is(file_name, std::ios::binary);
if (!is)
  return -1;
auto res = std::search(std::istream_iterator<char>(is), std::istream_iterator<char>(), pattern.begin(), pattern.end());
//...

It is fast and it is not loading the file all into memory at once. I do not know on what algorithm is based. The faster boyer_moore_searcher``boyer_moore_horspool_searcher cannot be used since it requires random iterators.

zdf
  • 4,382
  • 3
  • 18
  • 29
  • 1
    Will check out on Monday. I assume you meant "not loading the file all into memory at once" since I can't figure out how it would search the file if it's never read from the disk at all :-) – paxdiablo Nov 03 '19 at 02:45
1

The simplest and the fastest approach is, how @ZDF suggested in comments, to read the entire file into memory and then to search its content for your string:

#include <fstream>
#include <vector>
#include <algorithm>

std::ifstream ifs(filename, std::ios::binary);
ifs.seekg(0, std::ios::end);
auto size = ifs.tellg();
ifs.seekg(0);
std::vector<char> content(size, '\0');
ifs.read(content.data(), size);
auto res = std::search(content.begin(), content.end(), str.begin(), str.end());
Andriy Tylychko
  • 15,967
  • 6
  • 64
  • 112
  • Thanks, I'll check this out Monday. I wasn't keen on loading the entire file at once (this is an embedded platform) but it may be okay. – paxdiablo Nov 03 '19 at 02:43
  • I`m actually working on this matter, this approach is very slow. – Mecanik Apr 18 '21 at 02:47