Questions tagged [character-properties]

character-properties are a set of attributes supplied by the Unicode Standard. For each character contained in it, many properties are specified in relation to processes or algorithms that interpret them, in order to implement the character behavior.

The Unicode Standard, on top of defining the encoding of characters, also associates a rich set of semantics with each encoded character—properties that are required for interoperability and correct behavior in implementations, as well as for Unicode conformance. These semantics are cataloged in the Unicode Character Database (UCD), a collection of data files which contain the Unicode character code points and character names.

More information can be found on Wikipedia, in the official Unicode Standard as well as in this Unicode Technical Report.

92 questions

votes

2 answers

What is the {L} Unicode category?

I came across some regular expressions that contain [^\\p{L}]. I understand that this is using some form of a Unicode category, but when I checked the documentation, I found only the following "L" categories: Lu Uppercase letter …

asked May 11 '11 at 19:20

uTubeFan

6,664
12
41
65

votes

7 answers

Regex for names with special characters (Unicode)

Okay, I have read about regex all day now, and still don't understand it properly. What i'm trying to do is validate a name, but the functions i can find for this on the internet only use [a-zA-Z], leaving characters out that i need to accept to. I…

php javascript regex character-properties

asked May 11 '11 at 11:08

Kristoffer la Cour

2,591
3
25
36

votes

2 answers

Javascript unicode (greek) regular expressions

I would like to use this regular expression new RegExp("\b"+pat+"\b") in greek text but the "\b" metacharacter supports only ASCII characters. I tried XregExp library but i didnt manage to solve the issue. Any suggestions would be greatly…

javascript regex unicode character-properties xregexp

asked Apr 13 '11 at 13:33

kylito

votes

1 answer

Efficiently list all characters in a given Unicode category

Often one wants to list all characters in a given Unicode category. For example: List all Unicode whitespace, How can I get all whitespaces in UTF-8 in Python? Characters with the property Alphabetic It is possible to produce this list by…

python unicode character-properties

asked Jan 09 '13 at 20:30

Mechanical snail

29,755
14
88
113

votes

5 answers

How to validate both Chinese (unicode) and English name?

I have a multilingual website (Chinese and English). I like to validate a text field (name field) in javascript. I have the following code so far. var chkName = /^[characters]{1,20}$/; if( chkName.test("[name value goes here]") ){ …

javascript regex unicode character-properties

asked Jun 16 '11 at 19:25

Moon

22,195
68
188
269

votes

2 answers

How to determine if a character is a Chinese character

How to determine if a character is a Chinese character using ruby？

ruby unicode encoding cjk character-properties

asked Apr 28 '10 at 08:22

HelloWorld

7,156
6
39
36

votes

9 answers

Python: Split unicode string on word boundaries

I need to take a string, and shorten it to 140 characters. Currently I am doing: if len(tweet) > 140: tweet = re.sub(r"\s+", " ", tweet) #normalize space footer = "… " + utils.shorten_urls(post['url']) avail = 140 - len(footer) words…

python unicode internationalization character-properties

asked Nov 15 '09 at 20:53

Paul Tarjan

48,968
59
172
213

votes

1 answer

Regular expression to match boundary between different Unicode scripts

Regular expression engines have a concept of "zero width" matches, some of which are useful for finding edges of words: \b - present in most engines to match any boundary between word and non-word characters \< and \> - present in Vim to match only…

regex unicode character-properties word-boundary word-boundaries

asked May 11 '13 at 01:39

hippietrail

15,848
18
99
158

votes

3 answers

Latin Characters check

there are some similar questions out there, but none that are quite the same or that have an answer that works for me. I need a javascript function which validates whether a text field contains all valid latin characters, so no cryllic or Chinese,…

javascript regex unicode character-properties

asked Apr 03 '13 at 10:59

CompanyDroneFromSector7G

4,291
13
54
97

votes

3 answers

Scanning for Unicode Numbers in a string with \d

According to the Oniguruma documentation, the \d character type matches: decimal digit char Unicode: General_Category -- Decimal_Number However, scanning for \d in a string with all the Decimal_Number characters results in only latin 0-9 digits…

ruby regex unicode character-properties

asked Aug 09 '11 at 15:28

Phrogz

296,393
112
651
745

votes

3 answers

POSIX character equivalents in Java regular expressions

I would like to use a regular expression like this in Java : [[=a=][=e=][=i=]]. But Java doesn't support the POSIX classes [=a=], [=e=] etc. How can I do this? More precisely, is there a way to not use US-ASCII?

java regex posix-ere character-properties

asked Jul 07 '11 at 15:12

Stephan

41,764
65
238
329

votes

4 answers

regular expression containing unicode words

I'd like to match all strings containing a certain word. like: String regex = (?:\P{L}|\W|^)(ベスパ)(?:\b|$) however, the Pattern class doesn't compile it: java.util.regex.PatternSyntaxException: Unmatched closing ')' near index…

java regex unicode character-properties

asked Apr 12 '11 at 21:14

Frost

3,786
5
23
29

votes

2 answers

Obtaining unicode characters of a language in Java

Is there any way in Java so that I can obtain all the Unicode characters of a particular language (for example Bengali or Arabic)?

java unicode character-properties

asked Nov 21 '10 at 10:59

Muhammad Asaduzzaman

1,201
3
19
33

votes

1 answer

Replace Unicode Control Characters

I need to replace all special control character in a string in Java. I want to ask the Google maps API v3, and Google doesn't seems to like these characters. Example:…

java regex google-maps unicode character-properties

asked Aug 09 '10 at 09:48

Cyril Gandon

16,830
14
78
122

votes

5 answers

How do I match only fully-composed characters in a Unicode string in Perl?

I'm looking for a way to match only fully composed characters in a Unicode string. Is [:print:] dependent upon locale in any regular expression implementation that incorporates this character class? For example, will it match Japanese character 'あ',…

regex perl unicode locale character-properties

asked Oct 15 '08 at 03:10

dreamlax

93,976
29
161
209

Prev 1

3 4 5 6 7 Next