13

I had a question about the use of std::search vs string::find for dealing with strings. I know it is often better to use a class specific member function algorithm over the standard library algorithm because it can optimize based on the class, but I was wondering whether it is reasonable to, say for the sake of consistency, use std::search with the iterators rather than string::find with indices.

Would it be a sin for me to do something like that or should I just stick to string::find? Are there any huge advantages of one over the other in terms of performance or style?

gowrath
  • 3,136
  • 2
  • 17
  • 32
  • `std::string::find` can you use things like the Boyer-Moore algorithm to speed things up. I'm not sure if `std::find` is allowed to do things like that. IMHO I'd stick with the string version when working with strings. – NathanOliver Apr 27 '17 at 12:17
  • @NathanOliver Actually, `std::string::find` doesn't work with the Boyer-Moore searcher, whereas `std::search` does. Or did you mean it is allowed to implement `std::string::find` with using Boyer-Moore? – Corristo Apr 27 '17 at 12:19
  • I had a similar thought as @Corristo and got a little confused. Why would the std::search provide the facility to do the Boyer-Moore while `string::find` doesn't (at least not formally in the documentation)? – gowrath Apr 27 '17 at 12:23
  • imo using indices with `std::string` is plain dumb and should be considered a downright bug in the interface. I'd go with `std::search` if it fits your aesthetic choice and doesn't horrendously degrades performance. – Passer By Apr 27 '17 at 12:25
  • if performance is your concern you may be interested in this SO question http://stackoverflow.com/questions/34402492/searching-for-holy-grail-of-search-and-replace-in-c – kreuzerkrieg Apr 27 '17 at 12:42

2 Answers2

10

Right now (27th April 2017), at least GCCs libstdc++ (which is also used by clang by default), implements std::string::find with a linear search and thus is much slower than using

std::string_view substr{"whatever"};
auto it = std::search(s.cbegin(), s.cend(),
                      std::boyer_moore_searcher(substr.begin(), substr.end())); 

The problem is that the Boyer-Moore searcher allocates memory for internal data structures, and thus can fail with a std::bad_alloc exception. However, std::string::find is marked noexcept, so using the already implemented Boyer-Moore searcher within std::string::find isn't straight-forward.

Corristo
  • 4,911
  • 1
  • 20
  • 36
  • this is hugely surprising, does the standard mandate `std::string::find` use a linear search? – Passer By Apr 27 '17 at 12:26
  • 2
    @PasserBy Yes I know. We discovered this when trying to find out why a Node.js implementation for a particular problem was much faster than the C++ one. Looking at libstdc++s implementation revealed the problem :) – Corristo Apr 27 '17 at 12:27
  • @PasserBy AFAIR the libstdc++ maintainers already hinted that they're going to rework `std::string::find` to also use Boyer-Moore. So I don't think it is mandated by the standard. – Corristo Apr 27 '17 at 12:29
  • That is somewhat relieving to hear :) I'd lose all hope in humanity otherwise. Should you specify a date in your answer to avoid future confusion? – Passer By Apr 27 '17 at 12:31
  • @PasserBy Just looked it up: [[string.find](http://eel.is/c++draft/string.find)] doesn't give any complexity guarantees. – Corristo Apr 27 '17 at 12:33
  • 3
    @PasserBy It does, however, guarantee that `std::string::find` is `noexcept`, but Boyer-Moore needs to allocate memory for internal data structures. I've added this remark to the answer. – Corristo Apr 27 '17 at 12:46
8

string::find uses linear search but it is several times faster than Boyer Moore for some cases (with the latest patch). I submitted a patch (first-element then memcomp) to both libstdc++ and libc++ which improved string::find significantly. You can try the recent gcc (7.1) and you will get the improved performance. You can also measure the performance with the simple benchmarking suite I wrote: https://github.com/hiraditya/std-benchmark

Especially for smaller strings, by the time Boyer Moore is busy constructing internal data structure, (sub) linear string::find will be done. Also for parsing HTML etc., where most of the searches are mismatches, string::find should be faster.

commit fc7ebc4b8d9ad7e2891b7f72152e8a2b7543cd65
Author: redi <redi@138bc75d-0d04-0410-961f-82ee72b054a4>
Date:   Mon Jan 9 13:05:58 2017 +0000

    PR66414 optimize std::string::find

    2017-01-09  Jonathan Wakely  <jwakely@redhat.com>
            Aditya Kumar  <hiraditya@msn.com>

        PR libstdc++/66414
        * include/bits/basic_string.tcc
        (basic_string::find(const CharT*, size_type, size_type)): Optimize.

    git-svn-id: svn+ssh://gcc.gnu.org/svn/gcc/trunk@244225 138bc75d-0d04-0410-961f-82ee72b054a4

PS: Using std::find will always be slower than the current std::string::find with the current implementation.

Mingye Wang
  • 1,107
  • 9
  • 32
A. K.
  • 34,395
  • 15
  • 52
  • 89
  • Is it possible to be more specific then 'some cases'? I was playing with std::search / std::boyer_moore_searcher vs std::string::find and found that the latter is always about a factor 10 - 15 times faster. In my case the haystack is a few K characters at most. GCC 9.2.1. – Lieuwe Feb 17 '20 at 14:49
  • 1
    @Lieuwe Here's the presentation which has some details on why string::find has improved quite a bit. slide 6-7 has the the ideas. https://github.com/hiraditya/std-benchmark/blob/master/docs/slides/slide-cppnow.pdf, I hope you find other slides useful as well. – A. K. Feb 21 '20 at 03:27