-1

I am trying to find posisition of mykeyword and extract strings from a loaded file (up to 200 MB).

procedure TForm5.Button4Click(Sender: TObject);
var
  Stream: TFileStream; 
  Buffer: array [0 .. 1023] of AnsiChar;
  i: Integer;
  myKeyword: string;
  pullStr: AnsiString;
begin
myKeyword :='anything';
  Stream := TFileStream.Create(edtTarget.Text, fmOpenRead);
    while Stream.Position < Stream.Size do
    begin
      Stream.Read(Buffer, 1024);
      m1.Lines.Add(Buffer); // no need, just display to evaluate
      (* 1. Get address of given keyword *)
      // i := Stream.PositionOf(myKeyword);   < how to do this?
      (* 2. Stream Exract *)
      // pullStr := Stream.copy(i,1000); < how to do this ?
    end;
end;

I have read other topics regarding file and string. I found a very good answer from here. And i think i want to expand those features. Something like

TFileSearchReplace.GetStrPos(const KeyWord: string) : Integer;
TFileSearchReplace.ExtractStr (const KeyWord: string; Len : Integer) ;  
Bianca
  • 973
  • 2
  • 14
  • 33
  • Since you already read the data into buffer, you should not search for myKeyword in the stream but in the buffer. If you find it, just add the offset into the buffer to Stream.Position to get the absolute position of myKeyword in the stream. BUT: You also have to handle the case that myKeyword starts at the end of one 1024 bytes block and continues into the next 1024 bytes block. – dummzeuch Feb 14 '18 at 15:21
  • @dummzeuch It is my first time try to do with buffer and `RawByteString`. I will try it. – Bianca Feb 14 '18 at 15:52
  • Lots of problems here. Starting with your code ignoring the return value of the call to Read. You also need to think about what happens when the target text straddles your buffer reads. – David Heffernan Feb 14 '18 at 17:21
  • If you read data into the buffer like that, you might cut up words (add additional CR+LFs), if they stretch across a 1024 byte boundary. It is well possible that that way you can find the word. – Rudy Velthuis Feb 14 '18 at 18:32
  • 1
    Better use a huge RawByteString with a preset length of the entire file and read it in one fell swoop. Then you can search the buffer. Or read the text line by line and search those. The latter is easy with a TStreamReader. and doesn't use that much memory (no 200MB), but is probably slower. – Rudy Velthuis Feb 14 '18 at 18:34
  • @Rudy Not so. That's going to lead to exhaustion of address space. – David Heffernan Feb 14 '18 at 21:15
  • @David: if you do it frequently, then perhaps. Otherwise, it should work. Of course one should always test such code. – Rudy Velthuis Feb 15 '18 at 07:12
  • What would be wrong on using memory mapped files here? It doesn't hurt that much and for huge files or long strings to search can be _just_ efficient. – Victoria Feb 20 '18 at 04:03
  • I have an answer by mr Rudy some days back, but it been removed. I wish it will be put back, so reader have more option. – Bianca Feb 20 '18 at 06:06

1 Answers1

1
procedure TForm5.Button4Click(Sender: TObject);
var
  Stream: TFileStream; 
  Buffer: AnsiString;
  i, BytesRead, SearchPos: Integer;
  myKeyword: string;
  pullStr: AnsiString;
  Found: Boolean;
begin
  myKeyword :='anything';
  Found := False;
  SetLength(Buffer, 1024);
  Stream := TFileStream.Create(edtTarget.Text, fmOpenRead);
    while Stream.Position < Stream.Size do
    begin
      // read some bytes and remember, how many bytes been read actually
      BytesRead := Stream.Read(Buffer[1], 1024);
      // glue new bytes to the end of the pullStr
      pullStr := pullStr + copy(Buffer, 1, BytesRead);
      // file is divided to two parts: before myKeyword, and after
      // if myKeyword alreay found, there is nothing to do, just repeat reading to pullStr
      if Found then
        continue;
      // if myKeyword is not found yet, pullStr acts like temporary buffer
      // search for myKeyword in buffer
      SearchPos := Pos(myKeyword, pullStr);
      if SearchPos > 0 then
      begin //keyword was found, delete from beginning up to and icluding myKeyword
        // from now on, pullStr is not tmp buffer, but result
        Found := True;
        Delete(pullStr, 1, SearchPos + Length(myKeyWord) - 1);
        continue;
      end;
      // myKeyword still not found. Find last line end in buffer
      SearchPos := LastDelimiter(#13#10, pullStr);
      // and delete everything before it
      if SearchPos > 0 then
        Delete(pullStr, 1, SearchPos);
      // so if myKeyword spans across two reads, it still will be found in next iteration
    end;
    // if there is no myKeyword in file, clear buffer
    if not Found then
      pullStr := '';
end;
WhiteWind
  • 204
  • 1
  • 5