0

I am having issues with loading a CSV into a StringGrid. Occasionally, it runs out of memory, but also it seems to have blank columns after each value. I've not really read from a CSV as opposed to output to one, so I took a stock example online and modified it for my needs.

This is what I've currently got:

procedure x.LoadCSVtoGrid(ACSVFile : String; AStringGrid: TStringGrid)
var
  LRowIndex, LColIndex: Integer;
  LStrLine: string;
  LFile: TStringList;
begin
  AStringGrid.RowCount := 0;
  AStringGrid.ColCount := 0;
  if not FileExists(ACSVFile) then
    exit;
  LFile := TStringList.Create;
  try
    LFile.LoadFromFile(ACSVFile);
      if LFile.Count = 0 then
        exit;
      AStringGrid.ColCount := Max(AStringGrid.ColCount, WordCount(LFile[0], [',', '"'], '"'));
      AStringGrid.RowCount := LFile.Count;
      for LRowIndex := 0 to LFile.Count - 1 do
      begin
        LStrLine := LFile[LRowIndex];
        LColIndex := 0;
        while LStrLine <> '' do
        begin
          if Pos('"', LStrLine) = 1 then
          begin
            Delete(LStrLine, 1, 1);
            AStringGrid.Cells[LColIndex, LRowIndex] := Copy(LStrLine, 1, Pos('"', LStrLine) - 1);
            Delete(LStrLine, 1, Pos('"', LStrLine));
          end
          else
          begin
            AStringGrid.Cells[LColIndex, LRowIndex] := Copy(LStrLine, 1, Pos(',', LStrLine) - 1);
            Delete(LStrLine, 1, Pos(',', LStrLine));
          end;
          Inc(LColIndex);
        end;
      end;
  finally
    LFile.Free;
  end;

For smaller CSV files, it does fine. I think it's reading up to 250-300 lines before. Some of the files it has to deal with now are 500+.

To be honest, I don't do much handling of the data of the CSV until it's been imported into the StringGrid, but once it's in the StringGrid, it's validated. I've got to make sure that commas within speech marks, ie "text, here", are ignored, as it's part of the value. Again, this appears to handle the reading fine.

Another issue I think I might run into is AStringGrid.RowCount := LFile.Count;, as some of the CSV files have blank lines. If there is a way to deal with this, I am happy to take suggestions.

There are a few versions of CSV files it should be able to read, ie the calculation of column counts and such. Code for WordCount:

function x.WordCount(const S: string; const WordDelims: TSysCharSet; const QuoteChar: Char) : Integer;
var
  LInWord: Boolean;
  LQuoteOpen: Boolean;
  i: Integer;
begin
  Result := 0;
  LInWord := False;
  LQuoteOpen := False;
  for i := 1 to Length(S) do
  begin
    if S[i] in WordDelims then
    begin
      if not LInWord or LQuoteOpen then
        LInWord := False
      else
      begin
        LInWord := True;
          Inc(Result);
      end;
    end
    else
    begin
      if S[i] = QuoteChar then
        LQuoteOpen := not LQuoteOpen;
      LInWord := True;
    end;
  end;
  if LInWord and (not LQuoteOpen) then
    Inc(Result);

I've tried multiple files, for the most part this issue only happens with larger CSV files with more content. I've tried various versions of CSV-to-StringGrid procedures to see if there is something innately wrong with the example I took above. The example works, but only on smaller files.

Let me know if you need more information.

Remy Lebeau
  • 555,201
  • 31
  • 458
  • 770
Eddy
  • 13
  • 3
  • 1
    `TStringList` is not well-designed for CSV files. At the very least, perhaps try using `TStringReader` instead, which has a `ReadLine()` method you can call in a loop so you are not loading the entire file into memory at one time. However, you really should use a proper CSV parser/library instead. There are plenty of them for Delphi floating around if you search for them. – Remy Lebeau Feb 09 '23 at 19:01
  • Well that seems to have solved the issue, moving to something a bit better. I didn't realise TStringList was that bad, but I've only used it to deal with small files. Thank you very much for your help, much appreciated. – Eddy Feb 10 '23 at 15:52

1 Answers1

0
  1. Memory issue

First you create a TStringList and then load it with data

LFile := TStringList.Create; LFile.LoadFromFile(ACSVFile);

Because you load the whole file into the string list, you need that much of memory, plus equally much to hold the data in the TStringGrid.

Reduce memory requirement by reading the file in chunks of, say, 1000 lines at the time, which you then can throw away after they are moved to the string grid.

OTOH, your "Out of memory" problem might also be caused by the errors in your code. I experienced an "Out of memory" error with my very small test file when run with your unaltered code.

  1. Issues with code

In my tests I used a simple file with a few records and a quoted field in different locations. The file content is:

one,two,"text, including comma",four,five
one,two,three,four,five
"text, including comma",two,three,four,five
one,two,three,four,"text, including comma"

You determine required number of columns in the TStringGrid, by calling the WordCount() function, to which you pass the first string from the string list.

  WordCount(const S: string; const WordDelims: TSysCharSet; const QuoteChar: Char) : Integer;

When I pass in the first test string,

'one,two,three,four,five',

WordCount returns correctly 5

Then, control returns to LoadCSVtoGrid(), and after assigning AStringGrid.ColCount and RowCount the for LRowIndex loop starts to fill the grid with data for the current row. Pay attention to the second part, after else:

        AStringGrid.Cells[LColIndex, LRowIndex] := Copy(LStrLine, 1, Pos(',', LStrLine) - 1);
        Delete(LStrLine, 1, Pos(',', LStrLine));

The Delete() deletes from beginning of LStrLine to Pos(',', LStrLine). This works ok for items "one,", "two,", "three," and "four,", but not for "five" as there is no comma after the last item.

This is the major flaw in the code as it never deletes the last item. Instead, since the loop runs while LString <> '' it just continues incrementing LColIndex

On my machine it stops after a couple of minutes with an out-of-memory error.

Here is my take on WordCount (renamed WordCountNew) function:

function TForm50.WordCountNew(const s: string; const Delimiter: Char;
  const QuoteChar: Char): Integer;
var
  InWord, InQuote: boolean;
  i: integer;
begin
  if s = '' then              // Just in case we are fed an empty string
    Exit(0);

  Result := 1;                // Init, at least one data item
  InWord := False;            // Init
  InQuote:= False;            // Init

  for i := 1 to Length(s) do
  begin

    if s[i] = QuoteChar then  // The field is quoted
      InQuote := not InQuote; // make note about it

    if s[i] = Delimiter then  // Delimiter found
    begin
      if not InQuote then     // ... but only count it,
        inc(Result);          // if not within a quote
    end;

  end;
end;

Then the LoadCSVtoGrid procedure:

procedure TForm50.LoadCSVtoGrid(ACSVFile: String; AStringGrid: TStringGrid);
var
  LRowIndex, LColIndex: Integer;
  LStrLine: string;
  LFile: TStringList;
  CommaPos: integer;  // added 
begin
  AStringGrid.RowCount := 0;
  AStringGrid.ColCount := 0;
  if not FileExists(ACSVFile) then
    exit;
  LFile := TStringList.Create;
  try
    LFile.LoadFromFile(ACSVFile);
      if LFile.Count = 0 then
        exit;
//  When determining column count we should ONLY count the field separator, comma.
//  A quote character is not an indication of a new column / field.
//  Therefore we remove the array of chars, `[',', '"']` and replace with `','`
//      AStringGrid.ColCount := Max(AStringGrid.ColCount, WordCount(LFile[0], [',', '"'], '"'));
      AStringGrid.ColCount := Max(AStringGrid.ColCount, WordCountNew(LFile[0], ',', '"'));
      AStringGrid.RowCount := LFile.Count;
      for LRowIndex := 0 to LFile.Count - 1 do
      begin
        LStrLine := LFile[LRowIndex];
        LColIndex := 0;

        while LStrLine <> '' do
        begin
          if Pos('"', LStrLine) = 1 then
          begin
            Delete(LStrLine, 1, 1);
            AStringGrid.Cells[LColIndex, LRowIndex] := Copy(LStrLine, 1, Pos('"', LStrLine) - 1);
            AStringGrid.UpdateControlState;
            Delete(LStrLine, 1, Pos('"', LStrLine));
            Delete(LStrLine, 1, Pos(',', LStrLine));
          end
          else
          begin
            CommaPos := Pos(',', LStrLine);
            if CommaPos = 0 then CommaPos := Length(LStrLine)+1;
            AStringGrid.Cells[LColIndex, LRowIndex] := Copy(LStrLine, 1, CommaPos-1); //Pos(',', LStrLine) - 1);
            AStringGrid.UpdateControlState;
            Delete(LStrLine, 1, CommaPos); // Pos(',', LStrLine));
          end;
          Inc(LColIndex);
        end;
      end;
  finally
    LFile.Free;
  end;
end;

I added the CommaPos variable, to make it easier to artificially simulate a comma at the end of the string.

With these changes the test file is properly read into the grid.

enter image description here

Tom Brunberg
  • 20,312
  • 8
  • 37
  • 54
  • I ended up going a predone parser route, however I will give this a go as well as that parser is also giving me a headache now and then but I've not had much of a chance to look at that side of the program for a little bit. I'll come back once I've given this a test. Thank you for the detailed explanation! Much appreciated to understand the why. – Eddy Feb 24 '23 at 09:16