-1

After looking at Delphi extract string between to 2 tags and trying the code given there by Andreas Rejbrand I realized that I needed a version that wouldn't stop after one tag - my goal is to write all the values that occur between two strings in several .xml files to a logfile.

<screen> xyz </screen> blah blah <screen> abc </screen> 

-> giving a logfile with
xyz
abc
... and so on.

What I tried was to delete a portion of the text read by the function, so that when the function repeated, it would go to the next instance of the desired string and then write that to the logfile too until there were no matches left - the boolean function would be true and the function could stop - below the slightly modified function as based on the version in the link.

function ExtractText(const Tag, Text: string): string;
var
  StartPos1, StartPos2, EndPos: integer;
  i: Integer;
  mytext : string;
  bFinished : bool;

begin
  bFinished := false;
  mytext := text;
  result := '';
  while not bFinished do
  begin
    StartPos1 := Pos('<' + Tag, mytext);
    if StartPos1 = 0 then bFinished := true;
    EndPos := Pos('</' + Tag + '>', mytext);
    StartPos2 := 0;
    for i := StartPos1 + length(Tag) + 1 to EndPos do
      if mytext[i] = '>' then
      begin
        StartPos2 := i + 1;
        break;
      end;

    if (StartPos2 > 0) and (EndPos > StartPos2) then
    begin
      result := result + Copy(mytext, StartPos2, EndPos - StartPos2);
      delete (mytext, StartPos1, 1);
    end

So I create the form and assign a logfile.

procedure TTagtextextract0r.FormCreate(Sender: TObject);
begin
  Edit2.Text:=(TDirectory.GetCurrentDirectory);
  AssignFile(LogFile, 'Wordlist.txt');
  ReWrite(LogFile);
  CloseFile(Logfile);
end;

To then get the files in question, I click a button which then reads them.

 procedure TTagtextextract0r.Button3Click(Sender: TObject);
 begin
   try
     sD := TDirectory.GetCurrentDirectory;
     Files:= TDirectory.GetFiles(sD, '*.xml');
   except 
     exit
   end;

   j:=Length(Files);
   for k := 0 to j-1 do
   begin
     Listbox2.Items.Add(Files[k]);
     sA:= TFile.ReadAllText(Files[k]);
     iL:= Length(sA);

     AssignFile(LogFile, 'Wordlist.txt');
     Append(LogFile);
     WriteLn(LogFile, (ExtractText('screen', sA)));
     CloseFile (LogFile);
   end;
 end;

 end.

My problem is that without the boolean loop in the function, the application only writes the one line per file and then stops but with the boolean code the application gets stuck in an infinite loop - but I can't quite see where the loop doesn't end. Is it perhaps that the "WriteLn" command can't then output the result of the function? If it can't, I don't know how to get a new line for every run of the function - what am I doing wrong here?

Community
  • 1
  • 1
  • 1
    Learn to 1) properly indent your code so you can see where blocks start and stop, and 2) learn to use the debugger to step through the code to see what's happening. In this case, set a breakpoint at `ListBox2.Items.Add(Files[k]);` and step through the code using F8 to execute each line. (And a hint: Move the `AssignFile` to before the loop start and the `CloseFile` to after the loop end. There's no point in opening and closing it repeatedly in each iteration of the loop.) – Ken White Nov 15 '16 at 01:00
  • Note that you can use `PosEx` function to continue searching from the last position, so avoiding `delete` usage. – MBo Nov 15 '16 at 02:10
  • 2
    Use an xml parser – David Heffernan Nov 15 '16 at 06:21
  • You are not deleting the portion of text read - you are deleting the first '<' of the '<' + tag + '>'. If you used the debugger you would see this. You actually need delete( mytext, 1, EndPos + 3 + Length(Tag)). – Dsm Nov 15 '16 at 09:14
  • Hi guys, thanks for the support. Yes, I now see the benefits of indentation! (@LU RD thanks for that editing) I tried out the breakpoints and then could hover over the variables and see the values assigned to them, very helpful. I take your point about the opening and closing in the loop, KW, thanks. Yeah, the XML Parser that is an approach I hadn't considered, thanks DH, will have to read into that! Ah, yes, spotted the infinite loop now too, thanks Dsm! Thanks for the note about PosEx MBo! – Polyglotpatrick Nov 15 '16 at 21:00

1 Answers1

1

First you need to get a grip on debugging

Look at this post for a briefing on how to pause and debug a program gone wild.

Also read Setting and modifying breakpoints to learn how to use breakpoints. If you would have stepped through your code, you would soon have seen where you go wrong.

Then to your problem:

In older Delphi versions (up to Delphi XE2) you could use the PosEx() function (as suggested in comments), which would simplify the code in ExtractText() function significantly. From Delphi XE3 the System.Pos() function has been expanded with the same functionality as PosEx(), that is, a third parameter Offset: integer

Since you are on Delphi 10 Seattle you can use interchangeably either System.StrUtils.PosEx() or System.Pos().

System.StrUtils.PosEx

PosEx() returns the index of SubStr in S, beginning the search at Offset

function PosEx(const SubStr, S: string; Offset: Integer = 1): Integer; inline; overload;

The implementation of ExtractText() could look like this (with PosEx()):

function ExtractText(const tag, text: string): string;
var
  startPos, endPos: integer;
begin
  result := '';
  startPos := 1;

  repeat
    startPos := PosEx('<'+tag, text, startpos);
    if startPos = 0 then exit;
    startPos := PosEx('>', text, startPos)+1;
    if startPos = 1 then exit;

    endPos := PosEx('</'+tag+'>', text, startPos);
    if endPos = 0 then exit;

    result := result + Copy(text, startPos, endPos - startPos) + sLineBreak;
  until false;
end;

I added sLineBreak (in unit System.Types) after each found text, otherwise it should work as you intended it (I believe).

Community
  • 1
  • 1
Tom Brunberg
  • 20,312
  • 8
  • 37
  • 54
  • In newer Delphi versions (Unicode), [System.Pos()](http://docwiki.embarcadero.com/Libraries/en/System.Pos) is equivalent to `System.StrUtils.PosEx()`. – LU RD Nov 15 '16 at 13:03
  • 1
    @LURD holy ... ! So it seems. Thanks, I did not know that. I need to rephrase my post. – Tom Brunberg Nov 15 '16 at 13:19
  • @LURD I have a gap in my installations. D2010 Pos() doesnt recognize the third parameter. DXE4 does. Do you have any of XE, XE2 or XE3 installed to check when the change happened? – Tom Brunberg Nov 15 '16 at 13:49
  • From [docs](http://docwiki.embarcadero.com/Libraries/XE3/en/System.Pos), it seems as XE3 is the first to introduce the offset. I don't have XE2 at hand to check code. I'll look later today. – LU RD Nov 15 '16 at 14:03
  • It was in XE3, checked. – LU RD Nov 15 '16 at 14:19
  • Thanks again @LURD – Tom Brunberg Nov 15 '16 at 14:21
  • Thanks very much for your explanation of PosEx and this function @TomBrunberg - I was originally using #13#10 for the linebreak, didn't know about that sLineBreak. I tried out the function that you wrote and it worked a treat, writing all the values as I had intended. I did get my version to work after getting rid of the infinite loop too, but it only worked for very small files for some reason (up to 5 kB) anything beyond that would leave it to crash. – Polyglotpatrick Nov 15 '16 at 21:04