1

How can I effectively check if a string contains one of a few sub strings? Suppose I have a string:

`Hi there, <B>my</B> name is Joe <DIV>.</DIV> Hello world. &nbsp;`

How can I check if the string contains either <B> OR <DIV> OR &nbsp;?

I could do a simple:

Result := (Pos('<B>', S) > 0) or 
          (Pos('<DIV>', S) > 0) or 
          (Pos('&nbsp;', S) > 0);

But this seems to be very inefficient since it make N (at worst) passes and my strings are considerably large.

zig
  • 4,524
  • 1
  • 24
  • 68
  • 4
    Iterate across the string one character at a time and check whether the following N characters match each candidate's N characters. Don't use `Copy` or any function that makes a copy of the string because the performance of all those heap allocations will be terrible. – David Heffernan Dec 07 '17 at 17:13
  • Do what SAX parser would do. – Victoria Dec 07 '17 at 17:23
  • 1
    Delphi does have a RegEx library. – Ron Maupin Dec 07 '17 at 20:54
  • 1
    @RonMaupin, See the tag: Delphi 7. it has no built-in regex support. and I thing that regex is an overkill in this case. – zig Dec 07 '17 at 22:37
  • 1
    There is a RegEx library for it. It is not included for that version as it is in later versions, but I did use it on Delphi 7. It greatly simplifies things like this because you create a regex string, and look for a match. – Ron Maupin Dec 07 '17 at 22:46
  • @RonMaupin, OK. which library you used in Delphi7? and what is the regex needed to resolve this question? – zig Dec 07 '17 at 22:50
  • Recommendations for off-site resources are explicitly off-topic here, but a simple search for `delphi regex library` will get you what you need. – Ron Maupin Dec 07 '17 at 22:53
  • @RonMaupin, I know about regex libraries for Delphi 7. and it *can* be a solution. but IMO it will be a bit too much to do this task. Thank you. – zig Dec 07 '17 at 23:00
  • 1
    If you are interested in high effectiveness, use some implementation of Aho-Corasick algorithm that looks for all patterns simultaneously. – MBo Dec 08 '17 at 02:34

2 Answers2

2

Here is my solution, thanks to David Heffernan comment:

function StringContainsAny(const S: string; const AnyOf: array of string): Boolean;
var
  CurrChr, C: PChar;
  I, L, H: Integer;
begin
  Result := False;
  CurrChr := PChar(S);
  while CurrChr^ <> #0 do
  begin
    H := High(AnyOf);
    for I := 0 to H do
    begin
      L := 0;
      C := PChar(AnyOf[I]);
      while C^ <> #0 do
      begin
        if C^ = CurrChr^ then
          Inc(L)
        else
          Break;
        Inc(C);
        Inc(CurrChr);
        if CurrChr^ = #0 then // end of S string
        begin
          Result := (C^ = #0);
          if Result or (not Result and (I = H)) then // match or last AnyOf
            Exit;
        end;
      end;
      if C^ = #0 then // match
      begin
        Result := True;
        Exit;
      end
      else
        Dec(CurrChr, L);
    end;
    Inc(CurrChr);
  end;
end;

I'm not sure it is perfect.


EDIT: What can I say? You know what they say about assumptions...
after actually testing, it seems like using Pos():

function StringContainsAny(const S: string; const AnyOf: array of string): Boolean;
var
  I: Integer;
begin
  for I := 0 to High(AnyOf) do
  begin
    if Pos(AnyOf[I], S) <> 0 then
    begin
      Result := True;
      Exit;
    end;
  end;
  Result := False;
end;

Is faster than my solution and @Green_Wizard solution! they did a good job with the Pos function!

zig
  • 4,524
  • 1
  • 24
  • 68
2

Slightly better version:

function StringContainsAny(const S: string; const AnyOf: array of string): Boolean;
var
  CurrChr, C: PChar;
  i, j, Ln: Integer;
begin
  for i := 1 to Length(S) do
  begin
    CurrChr := @S[i];
    for j := 0 to High(AnyOf) do
    begin
      C := @AnyOf[j][1]; // assume that no empty strings
      if C^ <> CurrChr^ then
        Continue;

      Ln := Length(AnyOf[j]);
      if (Length(S) + 1 - i) < Ln then // check bounds
        Continue;

      if CompareMem(C, CurrChr, Ln * SizeOf(C^)) then
        Exit(True);
    end;
  end;

  Exit(False);
end;

You can also build some table of stop-symbols and improve speed. It's kinda complex topic, so I can just suggest you to read, for example, book Bill Smyth "Computing Patterns in Strings".

Green_Wizard
  • 795
  • 5
  • 11
  • Your function fails with `StringContainsAny('Himyx
    ', ['', '
    >', '
    '])` when `
    ` is at the end of the string.
    – zig Dec 08 '17 at 11:05
  • 1
    Sorry, fixed and improved by using `CompareMem`. – Green_Wizard Dec 08 '17 at 12:16
  • Thanks. to my surprise, using `Pos()` as described in my question/answer is faster. – zig Dec 09 '17 at 07:37
  • 1
    Its kinda not true. Example: `StringContainsAny('stop! and than many Kb of text', ['000', '001', ..., '999', 'stop!'])`. My code is also not perfect and has many limitations, but it can be improved in many ways, depend on the task. Main part of my answer is "read a book" and than select one of algorithms or create your own. But, if solution with `Pos` is ok for you, than its great. – Green_Wizard Dec 09 '17 at 10:24