I need to strip out all non standard text characers from a string. I need remove all non ascii and control characters (except line feeds/carriage returns).
6 Answers
And here's a variant of Cosmin's that only walks the string once, but uses an efficient allocation pattern:
function StrippedOfNonAscii(const s: string): string;
var
i, Count: Integer;
begin
SetLength(Result, Length(s));
Count := 0;
for i := 1 to Length(s) do begin
if ((s[i] >= #32) and (s[i] <= #127)) or (s[i] in [#10, #13]) then begin
inc(Count);
Result[Count] := s[i];
end;
end;
SetLength(Result, Count);
end;

- 601,492
- 42
- 1,072
- 1,490
-
2Very good variant, only one reallocation and possimbly no reallocations if the string doesn't contain ani non-ASCII chars. – Cosmin Prund Apr 13 '11 at 18:46
-
var l, i, Count: Integer; begin l := Length(s); SetLength(Result, l); if l = 0 then Exit; Count := 0; for i := 1 to l do begin if ((s[i] >= #32) and (s[i] <= #127)) or (s[i] in [#10, #13]) then begin inc(Count); Result[Count] := s[i]; end; end; if l <> Count then SetLength(Result, Count); end; – Zam Feb 21 '20 at 17:53
Something like this should do:
// For those who need a disclaimer:
// This code is meant as a sample to show you how the basic check for non-ASCII characters goes
// It will give low performance with long strings that are called often.
// Use a TStringBuilder, or SetLength & Integer loop index to optimize.
// If you need really optimized code, pass this on to the FastCode people.
function StripNonAsciiExceptCRLF(const Value: AnsiString): AnsiString;
var
AnsiCh: AnsiChar;
begin
for AnsiCh in Value do
if (AnsiCh >= #32) and (AnsiCh <= #127) and (AnsiCh <> #13) and (AnsiCh <> #10) then
Result := Result + AnsiCh;
end;
For UnicodeString
you can do something similar.

- 23,965
- 9
- 74
- 154
-
6
-
3
-
1There are two potential problems: 1) Speed 2) Memory fragmentation. Could not be an issue if the function is called sometimes and with small strings. Could become one if the function is called often with larges strings. As usual, optimizations requires to understand where some code is expected to work. – Apr 13 '11 at 18:17
-
This will probably work well with small strings because the memory manager is optimised to deal with this pattern of allocation and because the small blocks make the required mem copy operation fairly fast. But given a reallocation-free drop-in alternative was offered (David's code, not mine) I'd never use this. – Cosmin Prund Apr 13 '11 at 18:57
-
1@David: wow, you are harsh on me today. First of all, this is a code sample showing how to do the proper comparisons. Optimizing it distracts from that point. Furthermore, premature optimization causes a lot of evil code. That's why I optimize code when performance is indeed an issue. I've added some comments in the code to warn, but for me those warnings would go with most sample code I encounter that prove a basic algorithm. – Jeroen Wiert Pluimers Apr 13 '11 at 19:07
-
@Jeroen This is pretty trivial stuff and to do it right isn't hard or particularly long-winded. It's a very common pattern. I wouldn't class this as an optimisation. I'd regard it as the baseline for reasonable code. Any optimised version would involve unrolling the loop. – David Heffernan Apr 13 '11 at 19:17
-
2@David: for you this is trivial, for me this is trivial, but for a lot of SO readers this is not trivial. It's the classic example of the Pareto Principle. I teach software developers for a part of my living and see that 80/20 rule on a very regular base. Hence my samples are meant to be understood by lots of people, and the people that need optimization will figure that out themselves. I can understand you see that in a different way, but I think commenting 'sloppy programmer' based on one code sample is way to harsh, especially since there is no secondary communication involved. – Jeroen Wiert Pluimers Apr 13 '11 at 19:35
-
@Jeroen You contradict yourself. In an earlier comment you stated, "I would fix it if speed became a problem." – David Heffernan Apr 13 '11 at 19:40
-
@David: I didn't see that Shane indicate that speed is a problem here. If he does, I can now point him to your optimized code (I upvoted it). If you hadn't posted it, I would optimize the code myself, and split the code into two methods: the regular one to show the basics, and the optimized one. That way anyone can make a comparison and see why things were optimized in a certain way. – Jeroen Wiert Pluimers Apr 13 '11 at 20:00
-
1Wow, #13 and #10 will always be stripped as the code stands, how could this be the accepted answer? – LU RD Oct 10 '13 at 18:27
-
-
3@JeroenWiertPluimers Premature micro-optimization and worrying about technical details below the abstraction of the language appear to be unfortunate traits of many Delphi developers (although I have no idea where or why it became part of the culture). Thus, I feel that your lesson about writing clean, clear code first and only optimizing if necessary (and normally after profiling) is even more important than your instruction about stripping characters from strings! – alcalde Feb 02 '14 at 00:08
if you don't need to do it in-place, but generating a copy of the string, try this code
type CharSet=Set of Char;
function StripCharsInSet(s:string; c:CharSet):string;
var i:Integer;
begin
result:='';
for i:=1 to Length(s) do
if not (s[i] in c) then
result:=result+s[i];
end;
and use it like this
s := StripCharsInSet(s,[#0..#9,#11,#12,#14..#31,#127]);
EDIT: added #127 for DEL ctrl char.
EDIT2: this is a faster version, thanks ldsandon
function StripCharsInSet(s:string; c:CharSet):string;
var i,j:Integer;
begin
SetLength(result,Length(s));
j:=0;
for i:=1 to Length(s) do
if not (s[i] in c) then
begin
inc(j);
result[j]:=s[i];
end;
SetLength(result,j);
end;

- 28,486
- 9
- 71
- 95
-
For Delphi 2010, use the `CharInSet` function instead of the `Ch in ...` construct. – Jeroen Wiert Pluimers Apr 13 '11 at 14:17
-
1Don't worry; your solution will work correctly. For non-Ascii characters the CharInSet function is required though. – Jeroen Wiert Pluimers Apr 13 '11 at 14:20
-
3Very slow, it will reallocate result over and over. I'd set result the same length of the original string, than after it has been processed set the actual length – Apr 13 '11 at 14:27
Here's a version that doesn't build the string by appending char-by-char, but allocates the whole string in one go. It requires going over the string twice, once to count the "good" char, once to effectively copy those chars, but it's worth it because it doesn't do multiple reallocations:
function StripNonAscii(s:string):string;
var Count, i:Integer;
begin
Count := 0;
for i:=1 to Length(s) do
if ((s[i] >= #32) and (s[i] <= #127)) or (s[i] in [#10, #13]) then
Inc(Count);
if Count = Length(s) then
Result := s // No characters need to be removed, return the original string (no mem allocation!)
else
begin
SetLength(Result, Count);
Count := 1;
for i:=1 to Length(s) do
if ((s[i] >= #32) and (s[i] <= #127)) or (s[i] in [#10, #13]) then
begin
Result[Count] := s[i];
Inc(Count);
end;
end;
end;

- 25,498
- 2
- 60
- 104
-
1Why would anyone downvote this? Not that it matters much, just curious. – Cosmin Prund Apr 14 '11 at 06:42
-
I would have not used StringOfChar but just SetLength(), anyway not a reason to downvote, although it requires walking the string twice. – Apr 14 '11 at 07:31
-
It does require walking the string twice, but it *guarantees* optimal allocation. If this is done for many-many strings optimal allocation is going to matter allot more then walking the string only once. – Cosmin Prund Apr 14 '11 at 08:16
-
Edited the answer to use `SetLength` and to implement a tiny optimization that allows the routine to do it's job with ZERO or 1 string allocations. – Cosmin Prund Apr 14 '11 at 08:19
-
@Cosmin one downside of multiple walks is that this code has two identical if statements which violates DRY – David Heffernan Apr 14 '11 at 08:44
-
@David, that's true. To be honest I value DRY allot more then runtime performance. I don't write speed-critical applications. – Cosmin Prund Apr 14 '11 at 08:51
-
@Cosmin As a maintainer of a 25 year old codebase, I agree, DRY comes first – David Heffernan Apr 14 '11 at 09:01
my performance solution;
function StripNonAnsiChars(const AStr: String; const AIgnoreChars: TSysCharSet): string;
var
lBuilder: TStringBuilder;
I: Integer;
begin
lBuilder := TStringBuilder.Create;
try
for I := 1 to AStr.Length do
if CharInSet(AStr[I], [#32..#127] + AIgnoreChars) then
lBuilder.Append(AStr[I]);
Result := lBuilder.ToString;
finally
FreeAndNil(lBuilder);
end;
end;
I wrote by delphi xe7
my version with Result array of byte :
interface
type
TSBox = array of byte;
and the function :
function StripNonAscii(buf: array of byte): TSBox;
var temp: TSBox;
countr, countr2: integer;
const validchars : TSysCharSet = [#32..#127];
begin
if Length(buf) = 0 then exit;
countr2:= 0;
SetLength(temp, Length(buf)); //setze temp auf länge buff
for countr := 0 to Length(buf) do if CharInSet(chr(buf[countr]), validchars) then
begin
temp[countr2] := buf[countr];
inc(countr2); //count valid chars
end;
SetLength(temp, countr2);
Result := temp;
end;