Regular expression to consider special characters in a string

Question

The issue is I have to tokenize data into tokens based on spaces at the same time I can't tokenize the data based on special characters. Right now the regex I have is

       (\w*[-*#+=;:\/,~_ ]*\w+)

With this when I process the string

    1-CHECK ON BLOCKS BELOW IF MARKET CORRECTION ARE LOADED: PCORP:BLOCK=ANCTRLG&amp;V5PTCLG;   AF55722  BRTBMWA-3289 (AF55722) in block ANCTRLG (Product ID: CAAZ 107 4493 R1A10 )  AF55736  BRTBMWA-3290 (AF55726)in block V5PTCLG  (Product ID: CAAZ 107 4260 R2A08 )  IF MARKET CORRECTIONS ARE LOADED THEN V5 INTERFACE PROPERTY MUST BE DEFINED AS FOLLOW : MUXFIM : ACC-OFF (Accelerate Alligment is not active) WLL    : ACC-ON  (Accelerate Alligment is active ) :  EXAPC:V5ID=v5id,PROP=ACC-OFF;

What it does is tokenizes the string based on spaces at the same time it also tokenizes the data based on special character like

             :  EXAPC:V5ID=v5id is tokenized to :  EXAPC, :V5ID and =v5id rather want it to split as : and EXAPC:V5ID=v5id

I want to avoid this any idea on this any help will be appreciated.

Why not just use .* for the whole thing as that will match any character you like to throw at it? — grail, Feb 19 '17 at 10:23
Move the `\w` inside the character class, `[-*#+=;:\/,~_ \w]+` — Toto, Feb 19 '17 at 11:23
In your special case [**`[-\w\ ]+`**](https://regex101.com/r/diWaWJ/1) will do what you want. — Jan, Feb 19 '17 at 11:53

score 1 · Accepted Answer · answered Feb 19 '17 at 13:20

1

Your regex matches "an optional word, then an optional list of special characters, then another word". In case you have two words, there is no option of having a special character before the first word. What you're probably looking for is ([-*#+=;:\/,~_ \w]+).

answered Feb 19 '17 at 13:20

Hetzroni

2,109
1
14
29

I have tried the provided regex, but it does not fulfill my requirement. I have updated the requirement accordingly. Please have a look into it. – Ashit_Kumar Feb 21 '17 at 16:06
Simply remove the space, leaving you with `([-*#+=;:\/,~_\w]+)`. – Hetzroni Feb 22 '17 at 21:19

Regular expression to consider special characters in a string

1 Answers1