Academic problem: Remove leading and trailing spaces from a string using a while
loop.
How do we approach this problem?
Well, we certainly would like to create a function that trims a string. This way, we can simply call this function every time we need to perform such an operation. This will make the code much more readable and easier to maintain.
Clearly, this function accepts a string and returns a string. Hence its declaration should be
function Trim(const AText: string): string;
Here I follow the convention of prefixing arguments by "A". I also use the const
prefix to tell the compiler I will not need to modify the argument within the function; this can improve performance (albeit very slightly).
The definition will look like this:
function Trim(const AText: string): string;
begin
// Compute the trimmed string and save it in the result variable.
end;
A first attempt
Now, let's attempt to implement this algorithm using a while
loop. Our first attempt will be very slow, but fairly easy to follow.
First, let us copy the argument string AText
to the result
variable; when the function returns, the value of result
will be its returned value:
result := AText;
Now, let us try to remove leading space characters.
while result[1] = ' ' do
Delete(result, 1, 1);
We test if the first character, result[1]
, is a space character and if it is, we use the Delete
procedure to remove it from the string (specifically, Delete(result, 1, 1)
removes 1
character from the string starting at the character with index 1
). Then we do this again and again, until the first character is something other than a space.
For example, if result
initially is ' Hello, World!'
, this will make it equal to 'Hello, World!'
.
Full code, so far:
function Trim(const AText: string): string;
begin
result := AText;
while result[1] = ' ' do
Delete(result, 1, 1);
end;
Now try this with a string that consists only of space characters, such as ' '
, or the empty string, ''
. What happens? Why?
Think about it.
Clearly, in such a case, result
will sooner or later be the empty string, and then the character result[1]
doesn't exist. (Indeed, if the first character of result
would exist, result
would be of length at least 1, and so it wouldn't be the empty string, which consists of precisely zero characters.)
Accessing a character that doesn't exist will make the program crash.
To fix this bug, we change the loop to this:
while (Length(result) >= 1) and (result[1] = ' ') do
Delete(result, 1, 1);
Due to a technique known as 'lazy boolean evaluation' (or 'short-circuit evaluation'), the second operand of the and
operator, that is, result[1] = ' '
, will not even run if the first operand, in this case Length(result) >= 1
, evaluates to false
. Indeed, false and <anything>
equals false
, so we already know the value of the conjunction in this case.
In other words, result[1] = ' '
will only be evaluated if Length(result) >= 1
, in which case there will be no bug. In addition, the algorithm produces the right answer, because if we eventually find that Length(result) = 0
, clearly we are done and should return the empty string.
Removing trailing spaces in a similar fashion, we end up with
function Trim(const AText: string): string;
begin
result := AText;
while (Length(result) >= 1) and (result[1] = ' ') do
Delete(result, 1, 1);
while (Length(result) >= 1) and (result[Length(result)] = ' ') do
Delete(result, Length(result), 1);
end;
A tiny improvement
I don't quite like the space character literals ' '
, because it is somewhat difficult to tell visually how many spaces there are. Indeed, we might even have a different whitespace character than a simple space. Hence, I would write #32
or #$20
instead. 32
(decimal), or $20
(hexadecimal), is the character code of a normal whitespace.
A (much) better solution
If you try to trim a string containing many million of characters (including a few million leading and trailing spaces) using the above algorithm, you'll notice that it is surprisingly slow. This is because we in every iteration need to reallocate memory for the string.
A much better algorithm would simply determine the number of leading and trailing spaces by reading characters in the string, and then in a single step perform a memory allocation for the new string.
In the following code, I determine the index FirstPos
of the first non-space character in the string and the index LastPos
of the last non-space character in the string:
function Trim2(const AText: string): string;
var
FirstPos, LastPos: integer;
begin
FirstPos := 1;
while (FirstPos <= Length(AText)) and (AText[FirstPos] = #32) do
Inc(FirstPos);
LastPos := Length(AText);
while (LastPos >= 1) and (AText[LastPos] = #32) do
Dec(LastPos);
result := Copy(AText, FirstPos, LastPos - FirstPos + 1);
end;
I'll leave it as an exercise for the reader to figure out the precise workings of the algorithm. As a bonus exercise, try to benchmark the two algorithms: how much faster is the last one? (Hint: we are talking about orders of magnitude!)
A simple benchmark
For the sake of completeness, I wrote the following very simple test:
const
N = 10000;
var
t: cardinal;
dur1, dur2: cardinal;
S: array[1..N] of string;
S1: array[1..N] of string;
S2: array[1..N] of string;
i: Integer;
begin
Randomize;
for i := 1 to N do
S[i] := StringOfChar(#32, Random(10000)) + StringOfChar('a', Random(10000)) + StringOfChar(#32, Random(10000));
t := GetTickCount;
for i := 1 to N do
S1[i] := Trim(S[i]);
dur1 := GetTickCount - t;
t := GetTickCount;
for i := 1 to N do
S2[i] := Trim2(S[i]);
dur2 := GetTickCount - t;
Writeln('trim1: ', dur1, ' ms');
Writeln('trim2: ', dur2, ' ms');
end.
I got the following output:
trim1: 159573 ms
trim2: 484 ms