1

String matching is the problem of finding all occurrences of a given pattern in a given text. There is a string matching tool that verifies algorithm(s) by the number of matched occurrence, not by the position-numbers of matched occurrence, see this post for an example.

SMART (String Matching Algorithms Research Tool) is an open source software which provides a standard framework for researchers in string matching. It helps users to test, design, evaluate and understand existing solutions for the exact string matching problem. In a paper (page 104) the author of the research article and the programmer of the tool wrote:

Algorithm verification

The tool verifies that all tested algorithms work properly. This verification is done by counting the number of matches returned by the procedure and testing whether the search stops properly at the end of the text. Since all searched patterns are always randomly extracted from the text, it is guaranteed that the number of occurrences is always equal or greater than 1..

How one can prove or convince other, that the number of occurrence is enough to verify an algorithm works 100% or provide 100% correct solutions?

It could be the case that I am missing something about the tool and article related to this, but if not, what is the explanation that this tool is reliable? Note, article related to this tool is published in ACM and the author is a research academic.

Edit:

The following is found in this link:

If the algorithm does not run under particular conditions (for instance when the length of the pattern is less than a given value), please make it return the value -1.

Michael
  • 191
  • 3
  • 16
  • Umm, you can never prove anything by sheer testing (unless the sample space is finite and you happen to test every possible combination, which is usually impossible). You *can* prove things by strict logical reasoning on computer programs, but these could get pretty nasty for most programs complex enough to do actual things. So in short, we usually never prove a software is 100% correct, and in many (most?) cases it is not. – MMZK1526 Aug 11 '22 at 15:20
  • Using only the number of matches is not enough. You can just submit a [random](https://dilbert.com/strip/2001-10-25) [number](https://xkcd.com/221/) and sooner or later it will turn out to be correct. – Some programmer dude Aug 11 '22 at 15:23
  • The blurb is nonsense. A string matching algorithm must be tested with at least one test case where the needle does not appear in the haystack, i.e. a test where the expected number of occurrences is zero. For a **serious** test, you should also check edge cases, e.g. where the pattern to match is _longer_ than the source text. This check is obviously skipped here, as the pattern is a substring of the source text'. – MSalters Aug 11 '22 at 15:33
  • @Someprogrammerdude That's why numerous tests are done, the probability of getting all the answers correct by chance is negligibly small. And it just takes one failure to prove that the software has a bug. – Barmar Aug 11 '22 at 15:33
  • IMO, this test strategy starts with a big flaw: it does not check the behavior in case of *no occurrence*. –  Aug 11 '22 at 15:36
  • "Negligibly small" is not zero... :) If even one false or wrong result slips by then it's not 100% is it? Doesn't matter if it's one in 100, or one in a million. – Some programmer dude Aug 11 '22 at 15:37
  • @MMZK1526: I would not be so affirmative. IMO there are cases when it suffices to check a small number of limit cases, and the rest follows by monotonicity or similar arguments. E.g. if a given linear search works for no occurrence of the key, and for the first two or last two positions, you can reasonably assume that its works for any position (unless observation of the code shows you an unusual method). –  Aug 11 '22 at 15:40
  • @YvesDaoust "reasonably assume" is never a synonym for "certainty". Surely, the chance of having a bug could be extremely small, but that does not equate with zero. As for "monotonicity or similar arguments", you may do the math strictly, which in this case would be the "strict logical reasoning" instead of only test-driven development. – MMZK1526 Aug 11 '22 at 15:43
  • @MMZK1526: by struggling to find counter-arguments, you throw the baby with the water of the bath. –  Aug 11 '22 at 15:46
  • @YvesDaoust Which is why I'm not struggling :). I simply acknowledge that in many cases we don't have certainty, but still live happily with "reasonably assume", as you mentioned. – MMZK1526 Aug 11 '22 at 15:53
  • @YvesDaoust please see the edit. – Michael Aug 11 '22 at 16:33

0 Answers0