Questions tagged [character-properties]

character-properties are a set of attributes supplied by the Unicode Standard. For each character contained in it, many properties are specified in relation to processes or algorithms that interpret them, in order to implement the character behavior.

The Unicode Standard, on top of defining the encoding of characters, also associates a rich set of semantics with each encoded character—properties that are required for interoperability and correct behavior in implementations, as well as for Unicode conformance. These semantics are cataloged in the Unicode Character Database (UCD), a collection of data files which contain the Unicode character code points and character names.

More information can be found on Wikipedia, in the official Unicode Standard as well as in this Unicode Technical Report.

92 questions
3
votes
1 answer

Substitution: "\p{Cntrl}" - "\P{Print}"

Until now I use these two substitutions before printing "$string" to the terminal. $string =~ s/\p{Space}/ /g; $string =~ s/\p{Cntrl}//g; Is there something that I should consider, when I replace the first two substitutions with the following…
sid_com
  • 24,137
  • 26
  • 96
  • 187
3
votes
1 answer

What's the difference between GC=Mark and GC=Punctuation in Unicode general categories?

I'm having trouble understanding some concepts. In the Unicode spec, there's a property called general category. OK I understood what are each of letters (usual characters; GC=L), numbers (like digits 0–9 and other characters that have numeric…
eonil
  • 83,476
  • 81
  • 317
  • 516
2
votes
2 answers

python unicode regex

I'd like to replace the below regex with a unicode-friendly version that will catch things like http://➡.ws and other non-ascii IRIs. The purpose is to grab these out of users' text and encode and html-ize them into real links. Python provides a…
bukzor
  • 37,539
  • 11
  • 77
  • 111
2
votes
1 answer

Perl script stops. Error: Can't find unicode property definition ASCII

I've inherited some perl scripts. (I'm not a perl programmer). I'm seeing an error "can't find unicode property definition ascii" on the below line $value =~ s/[^[:\p{ascii}]]//g Would this error cause the program execution to stop? As it's the…
Andi McLean
  • 51
  • 1
  • 5
2
votes
1 answer

Regex for : I have a requirement of matching the value request parameter with unicode charcters but it should not allow space

Regex for JAVA : I have a requirement of matching the value of a request parameter with unicode charcters but it should not allow space . Basically a regex which should allow all unicode charcters without space.I tried with all efforts but in vain…
Suraj
  • 21
  • 1
2
votes
2 answers

Unicode scripts in Regular Expressions

I want to guess the human language of a string. I found the Unicode scripts in Regular Expressions could do the trick. But I don't know what the script name stands for. As far as I know, Han stands for Chinese, but what about others?
Shisoft
  • 4,197
  • 7
  • 44
  • 61
2
votes
3 answers

How do I create a Perl regex that matches non-alphanumeric characters except spaces?

I have a Perl regex /\W/i which matches all non-alphanumeric characters, but it also matches spaces which I want to ignore. How do I get it to match non-alphanumeric characters except spaces?
Joe Schmoe
  • 563
  • 3
  • 6
  • 8
2
votes
1 answer

Python regex with unicode characters bug?

Long story short: >>> re.compile(r"\w*").match(u"Français") <_sre.SRE_Match object at 0x1004246b0> >>> re.compile(r"^\w*$").match(u"Français") >>> re.compile(r"^\w*$").match(u"Franais") <_sre.SRE_Match object at 0x100424780> >>> Why doesn't it…
ak.
  • 3,329
  • 3
  • 38
  • 50
2
votes
2 answers

Regex to match all unicode quotation marks

Is there a simple regular expression to match all unicode quotes? Or does one have to hand-code it like this: quotes = ur"[\"'\u2018\u2019\u201c\u201d]" Thank you for reading. Brian
Brian M. Hunt
  • 81,008
  • 74
  • 230
  • 343
2
votes
2 answers

Regex Not Matching Unicode

How would I go about using Regex to match Unicode strings? I'm loading in a couple keywords from a text file and using them with Regex on another file. The keywords both contain unicode (such as á, etc). I'm not sure where the problem is. Is there…
cam
  • 8,725
  • 18
  • 57
  • 81
2
votes
3 answers

Is there a way to tell if a unicode character is a control, alpha, numeric or symbolic?

Assuming all you have is the binary data and no pre-canned functions, is there a pattern or algorithm to categorize the type of character?
Oorang
  • 6,630
  • 1
  • 35
  • 52
2
votes
1 answer

How can I retrieve the character position of a specific character in a file using VI?

I need to retrieve the character position of a character in a file. How can I do this, using Vi?
ZakTaccardi
  • 12,212
  • 15
  • 59
  • 107
2
votes
1 answer

How to test if the first character in a symbol is a letter in lisp?

How to test if the first character in a symbol is a letter in lisp? I know it has something to do with the alpha-char-p function.
Ester
  • 167
  • 1
  • 12
1
vote
2 answers

match names with unicode chars

can somebody help me to match following type of strings "BEREŽALINS", "GŽIBOVSKIS" in C# and js , I've tried \A\w+\z (?>\P{M}\p{M}*)+ ^[-a-zA-Z\p{L}']{2,50}$ , and so on ... but nothing works . Thanks
user872761
  • 13
  • 4
1
vote
3 answers

Laundering tainted data

When I do laundering tainted data with checking whether it has any bad characters are there unicode-properties which will filter the bad characters?
sid_com
  • 24,137
  • 26
  • 96
  • 187