9

I am having a big problem to write a regexp that will trim all the whitespace in my input.

I have tried \s+ and [ \t\t\r]+ but that don't work.

I need this because I am writing a scanner using flex, and I am stuck at matching whitespace. The whitespace should just be matched and not removed.

Example input:

program 
3.3 5 7 
{ comment }
string
panic: cant happen
Brian Tompsett - 汤莱恩
  • 5,753
  • 72
  • 57
  • 129
mrjasmin
  • 1,230
  • 6
  • 21
  • 37

2 Answers2

15
  1. flex uses (approximately) the POSIX "Extended Regular Expression" syntax -- \s doesn't work, because it's a Perl extension.

  2. Is [ \t\t\r]+ a typo? I think you'll want a \n in there.

Something like [ \n\t\r]+ certainly should work. For example, this lexer (which I've saved as lexer.l):

%{

#include <stdio.h>

%}

%option noyywrap

%%

[ \n\t\r]+  { printf("Whitespace: '%s'\n", yytext); }
[^ \n\t\r]+ { printf("Non-whitespace: '%s'\n", yytext); }

%%

int main(void)
{
    yylex();
    return 0;
}

...successfully matches the whitespace in your example input (which I've saved as input.txt):

$ flex lexer.l
$ gcc -o test lex.yy.c
$ ./test < input.txt
Non-whitespace: 'program'
Whitespace: ' 
'
Non-whitespace: '3.3'
Whitespace: ' '
Non-whitespace: '5'
Whitespace: ' '
Non-whitespace: '7'
Whitespace: ' 
'
Non-whitespace: '{'
Whitespace: ' '
Non-whitespace: 'comment'
Whitespace: ' '
Non-whitespace: '}'
Whitespace: '
'
Non-whitespace: 'string'
Whitespace: '
'
Non-whitespace: 'panic:'
Whitespace: ' '
Non-whitespace: 'cant'
Whitespace: ' '
Non-whitespace: 'happen'
Whitespace: '
'
Matthew Slattery
  • 45,290
  • 8
  • 103
  • 119
  • Yes I meant \n instead of to t:s in [ \t\t\r]+ Thanks for your answer, its correct :) – mrjasmin Nov 11 '12 at 17:02
  • Use `[ \n\t\r\f]+ ` for matching all line ending. Omitting `\f` won't match Windows/DOS file endings. Source: http://web.eecs.utk.edu/~bvz/cs461/notes/flex/ – ribamar Dec 10 '15 at 17:29
-1

I'm not a specialist in flex, but have you should use /g and /m flags in your regular expression, to work with multiline srings.

Vyacheslav Voronchuk
  • 2,403
  • 19
  • 16