2

In a recent application that involved receiving strings over a serial link I found myself writing code like:

if (pos('needle', haystack) = 1) then ...

in order to check if a particular substring is at the begging of a string.

It struck me that the pos function is not ideal for this as it has no idea which location I'm looking for the substring to be in.

Is there a good function that does this?

Is there a more generalised function like IsSubStringAt(needle, haystack, position)?

I did think of using something like this:

function IsSubstrAt(const needle, haystack: string; position: Integer): Boolean;
var
  ii: integer;
begin
  result := true;
  for ii := 1 to length(needle) de begin
    if (haystack[poition + ii -1] <> needle[ii]) then begin
      result := false;
      break;
    end;
  end;
end;

with some error checking.

I was hoping to find a ready rolled answer.

NGLN
  • 43,011
  • 8
  • 105
  • 200
Michael Vincent
  • 1,620
  • 1
  • 20
  • 46
  • If Needle is known in advance of data arriving over the link, you might make yourself a little state-machine which listens out for the characters of Needle arriving in succession. – MartynA Oct 29 '14 at 15:56
  • @Martyn needle is known, along with 99 other needles of varying lengths. Would you do 100 different state machines, or 1 handling all 100 needles? – Michael Vincent Oct 29 '14 at 16:28
  • Well, call me Mr Chicken but I'd only really consider implementing a s-m myself if there were only one needle. OTOH, there must surely be some library code somewhere that does a multi-needle, speed-optimised one. ISTR that one of the early (I mean pre-Internet) site-to-site comms packages (maybe Crosstalk) had something like that - and I wouldn't be surprised to find one in the typical modem's firmware, for recognising "At" commands etc. Btw, +1 for a very interesting q. – MartynA Oct 29 '14 at 16:42
  • @ Mr Chicken - I just tend to use a list of if-then-elses its not so pretty, but it works and is readable :) (That's me being a chick). I'm pretty sure Turbopower's ASync Pro has some s-m components. They get used for implementing protocols like x-modem. Too complex for my simple needs :) Thanks for the +1 – Michael Vincent Oct 29 '14 at 16:58
  • Is this lengthy discussion about [`PosEx`](http://docwiki.embarcadero.com/Libraries/XE7/en/System.StrUtils.PosEx)? – Free Consulting Oct 29 '14 at 18:15
  • For checking a substring at the very beginning or very end, you can use `AnsiStartsText()` and `AnsiEndsText()` in the `StrUtils` unit. – Remy Lebeau Oct 29 '14 at 18:30

4 Answers4

5

Since you only want to look at one position, you can just form the substring and test that. Like this:

function IsSubStringAt(const needle, haystack: string; position: Integer): Boolean;
var
  substr: string;
begin
  substr := Copy(haystack, position, Length(needle));
  Result := substr = needle;
end;

If performance was really critical then then you would want to perform the comparison in-place without creating a copy, and thereby performing heap allocation. You could use AnsiStrLComp for this.

function IsSubStringAt(const needle, haystack: string; position: Integer): Boolean;
begin
  if Length(haystack) - position + 1 >= Length(needle) then begin
    Result := AnsiStrLComp(
      PChar(needle), 
      PChar(haystack) + position - 1, 
      Length(needle)
    ) = 0;
  end else begin
    Result := False;
  end;
end;

If you want to check without senstivity to case, replace = with SameText in the first version, and replace AnsiStrLComp with AnsiStrLIComp in the second version.

David Heffernan
  • 601,492
  • 42
  • 1,072
  • 1,490
  • @KenWhite Length is a simple memory read. The expense is the heap allocation. That can readily be removed if needed. What you have to understand about Pos is that it searches for the first occurence of the substring. If the substring is not there, Pos has to check for it at pos 1, pos 2, pos 3, and right the way through to the end of the string. That means that the run time for Pos is loosely proportional to the length of haystack. This is what Michael wants to avoid. – David Heffernan Oct 29 '14 at 15:47
  • That is exactly what I'm trying to avoid, David. You wrote it clearer than I did. One thing with your solution is that the copy() does quite a bit of work before the comparison is done, but if the first char is different the code need go no further as needle is not at position. – Michael Vincent Oct 29 '14 at 15:55
3

You could use CompareMem() to directly compare the string contents:

  function IsSubStringAt(const aNeedle, aHaystack: String; aPosition: Integer): Boolean;
  var
    needleLen: Integer;
  begin
    needleLen := Length(aNeedle);
    result    := (needleLen + aPosition - 1) <= Length(aHaystack);

    if result then
      result := CompareMem(Pointer(aNeedle), @aHaystack[aPosition], needleLen * sizeof(Char));
  end;

Note that we short-circuit the need to do any comparison if the haystack is too short to contain the needle at the specified position.

Using the CompareMem() API ensures that the implementation is portable and will also work with a Unicode String type (should you ever migrate or use this code in a Unicode version of Delphi) as long as the size of the Char type is taken into account, as is done here.

This approach however assumes that strings have already been normalised to any extent required such that the byte content of the strings are directly comparable.

Deltics
  • 22,162
  • 2
  • 42
  • 70
  • This is the platform specific version of variant 2 from my answer, and also Uwe's answer. Both of those are cross platform. On windows they are implemented by, guess what, a call to CompareString. – David Heffernan Oct 29 '14 at 20:27
  • 2
    Quite right - as you say, the OP mentions in a comment that Kylix portability is a concern. This is easily addressed by using **CompareMem()**, which is portable and most efficient assuming the strings are already normalised for comparison purposes. This still avoids using the ANSI wrapper routine, which not only has some overhead but is potentially confusing (and definitely irritating) should the code ever be ported to Unicode. I've updated the answer accordingly, which puts a bit more clear water between it and the other answers. I hope you approve. – Deltics Oct 29 '14 at 21:13
2

Since XE7 you can use (assuming position is 1-based):

function IsSubStringAt(const needle, haystack: string; position: Integer): Boolean;
begin
  result := string.Compare(hayStack, position-1, needle, 0, needle.Length) = 0;
end;
David Heffernan
  • 601,492
  • 42
  • 1,072
  • 1,490
Uwe Raabe
  • 45,288
  • 3
  • 82
  • 130
  • Thanks Uwe. Sadly I'm still on Delphi 6 (because some of our code uses Kylix). I weep when I see some of the language features some of the new compliers have. I so want to use future variables, for example. – Michael Vincent Oct 29 '14 at 16:05
  • 1
    I don't know which "new language features" you are lamenting the lack of in this code. Surely not the pseudo-object method for string comparison, which is just a syntactic wrapper around a good old fashioned "function", the sort of thing that is necessary in OOb (Object Obsessed) languages that do not *support* first class functions. You can do the same thing using the Windows CompareString() API directly, for which String.Compare() is a naive wrapper (introducing in the process the confusion of using 0 based indexing, rather than the 1 based indexes of the String type it adorns). – Deltics Oct 29 '14 at 19:06
2

Here is a very fast way to do this, written in assembly language. I made it by modifying the original Delphi's Pos function:

Function PosS (Substr:string; S:string; Position:integer) : integer;
  Asm
    TEST    EAX,EAX
    JE      @@NoWork

    TEST    EDX,EDX
    JE      @@StringEmpty

    PUSH    EBX
    PUSH    ESI
    PUSH    EDI

    MOV     ESI, EAX                         //  Pointer to Substr
    MOV     EDI, EDX                         //  Pointer to S
    MOV     EBX, ECX                         //  Position
    DEC     EBX

    MOV     ECX, [EDI-4]                     // Length (S)
    SUB     ECX, EBX

    PUSH    EDI
    ADD     EDI, EBX

    MOV     EDX, [ESI-4]                     // Length (Substr)

    DEC     EDX
    JS      @@Fail
    MOV     AL, [ESI]
    INC     ESI

    SUB     ECX, EDX                         // = Length (S) - Length (Substr) + 1
    JLE     @@Fail
@@Loop:
    REPNE   SCASB
    JNE     @@Fail
    MOV     EBX, ECX
    PUSH    ESI
    PUSH    EDI

    MOV     ECX, EDX
    REPE    CMPSB
    POP     EDI
    POP     ESI
    JE      @@Found
    MOV     ECX, EBX
    JMP     @@Loop

@@Fail:
    POP     EDX
    XOR     EAX, EAX
    JMP     @@Exit

@@StringEmpty:
    XOR     EAX, EAX
    JMP     @@NoWork

@@Found:
    POP     EDX
    MOV     EAX, EDI
    SUB     EAX, EDX
@@Exit:
    POP     EDI
    POP     ESI
    POP     EBX
@@NoWork:
  End;
adlabac
  • 416
  • 4
  • 12