I have an UTF-8 string, which might be in any language.
How do I check, if it does not contain any non-alphanumeric characters?
I could not find such method in UnicodeUtils Ruby gem.
Examples:
- ėččę91 - valid
- $120D - invalid
I have an UTF-8 string, which might be in any language.
How do I check, if it does not contain any non-alphanumeric characters?
I could not find such method in UnicodeUtils Ruby gem.
Examples:
You can use the POSIX notation for alpha-numerics:
#!/usr/bin/env ruby -w
# encoding: UTF-8
puts RUBY_VERSION
valid = "ėččę91"
invalid = "$120D"
puts valid[/[[:alnum:]]+/]
puts invalid[/[^[:alnum:]]+/]
Which outputs:
1.9.2
ėččę91
$
In ruby regex \p{L} means any letter (in any glyph)
so if s represents your string:
s.match /^[\p{L}\p{N}]+$/
This will filter out non numbers and letters.
The pattern for one alphanumeric code point is
/[\p{Alphabetic}\p{Number}]/
From there it’s easy to extrapolate something like this for has a negative:
/[^\p{Alphabetic}\p{Number}]/
or this for is all positive:
/^[\p{Alphabetic}\p{Number}]+$/
or sometimes this, depending:
/\A[\p{Alphabetic}\p{Number}]+\z/
Pick the one that best suits your needs.