1

I have to clean a string passed in parameter, and remove all lowercase letters, and all special character except :

  • +
  • |
  • ^
  • space
  • =>
  • <=>

so i have this string passed in parameter:

aA azee + B => C=

and i need to clean this string to have this result:

A + B => C

I do

string.gsub(/[^[:upper:][+|^ ]]/, "")

output: "A + B C"

I don't know how to select the => (and for <=>) string's with regex in ruby)

I know that if i add string.gsub(/[^[:upper:][+|^ =>]]/, "") into my regex, the last = in my string passed in parameter will be selected too

3 Answers3

5

You can try an alternative approach: matching everything you want to keep then joining the result.

You can use this regex to match everything you want to keep:

[A-Z\d+| ^]|<?=>

As you can see this is just a using | and [] to create a list of strings that you want to keep: uppercase, numbers, +, |, space, ^, => and <=>.

Example:

"aA azee + B => C=".scan(/[A-Z\d+| ^]|<?=>/).join()

Output:

"A  + B => C"

Note that there are 2 consecutive spaces between "A" and "+". If you don't want that you can call String#squeeze.

Sweeper
  • 213,210
  • 22
  • 193
  • 313
1

See regex in use here

(<?=>)|[^[:upper:]+|^ ]
  • (<?=>) Captures <=> or => into capture group 1
  • [^[:upper:]+|^ ] Matches any character that is not an uppercase letter (same as [A-Z]) or +, |, ^ or a space

See code in use here

p "aA azee + B => C=".gsub(/(<?=>)|[^[:upper:]+|^ ]/, '\1')

Result: A + B => C

ctwheels
  • 21,901
  • 9
  • 42
  • 77
  • I prefer this solution because it explicitly excludes characters, as opposed to including what is inferred to be the strings to be kept. Also, the POSIX expression for uppercase letters has wider applicability than `A-Z`. – Cary Swoveland Apr 09 '18 at 20:31
0
r = /[a-z\s[:punct:]&&[^+ |^]]/

"The cat, 'Boots', had 9+5=4 ^lIVEs^ leF|t.".gsub(r,'')
  #=> "T  B  9+54 ^IVE^ F|"

The regular expression reads, "Match lowercase letters, whitespace and punctuation that are not the characters '+', ' ', '|' and '^'. && within a character class is the set intersection operator. Here it intersects the set of characters that match a-z\s[:punct:] with those that match [^+ |^]. (Note that this includes whitespaces other than spaces.) For more information search for "character classes also support the && operator" in Regexp.

I have not included '=>' and '<=>' as those, unlike '+', ' ', '|' and '^', are multi-character strings and therefore require a different approach than simply removing certain characters.

Cary Swoveland
  • 106,649
  • 6
  • 63
  • 100