Questions tagged [character-properties]

character-properties are a set of attributes supplied by the Unicode Standard. For each character contained in it, many properties are specified in relation to processes or algorithms that interpret them, in order to implement the character behavior.

The Unicode Standard, on top of defining the encoding of characters, also associates a rich set of semantics with each encoded character—properties that are required for interoperability and correct behavior in implementations, as well as for Unicode conformance. These semantics are cataloged in the Unicode Character Database (UCD), a collection of data files which contain the Unicode character code points and character names.

More information can be found on Wikipedia, in the official Unicode Standard as well as in this Unicode Technical Report.

92 questions

votes

2 answers

What are the `unicode groups` and `block ranges` that can be specified in `\p{name}`?

What are the unicode groups and block ranges that can be specified in character class \p{name}? e.g. \p{IsGreek} Where Is the list of names & description available?

regex pcre character-properties

asked Jan 25 '12 at 12:28

ThinkingMonkey

12,539
13
57
81

votes

1 answer

Properties of combining diacritics

For combining diacritics, are they counted as letters? Since, as far as I know, they can only combine with other letters in well-formed Unicode. The ICU function to determine if a Unicode codepoint is a letter only takes one codepoint, so for any…

unicode character-properties

asked Nov 26 '11 at 20:38

Puppy

144,682
38
256
465

votes

2 answers

Enumerate a character's Unicode properties in Ruby?

Is there any way to enumerate all of a character's Unicode properties in Ruby? I can use Ruby 1.9's Regexp class to test whether a given character has a particular property (e.g., some_char =~ /\p{P}/ to test whether some_char is punctuation,…

ruby unicode character-properties

asked Apr 29 '11 at 16:54

Steven Bedrick

votes

1 answer

@Pattern with Unicode script \\p{L}* doesn't work

I have problem with javax.validation.constraints.Pattern @Pattern validation. @Pattern(regexp = "\\p{L}*", message = "Msg") private String name; When I'm trying to input any text it doesn't work. When I used: @Pattern(regexp = "[a-zA-Z]*",…

java regex pattern-matching bean-validation character-properties

asked Nov 08 '16 at 10:06

Tomasz Gutkowski

1,388
4
20
28

votes

3 answers

Mathematica regular expressions on unicode strings

This was a fascinating debugging experience. Can you spot the difference between the following two lines? StringReplace["–", RegularExpression@"[\\s\\S]" -> "abc"] StringReplace["-", RegularExpression@"[\\s\\S]" -> "abc"] They do very different…

regex debugging wolfram-mathematica pcre character-properties

asked Mar 25 '10 at 02:32

dreeves

26,430
45
154
229

votes

5 answers

Match unicode in ply's regexes

I'm matching identifiers, but now I have a problem: my identifiers are allowed to contain unicode characters. Therefore the old way to do things is not enough: t_IDENTIFIER = r"[A-Za-z](\\.|[A-Za-z_0-9])*" In my markup language parser I match…

python regex unicode ply character-properties

asked Oct 26 '08 at 16:35

Cheery

24,645
16
59
83

votes

2 answers

Searching unicode text using regex

Searching a file which is written in Hindi(Devanagri) (UTF-16) gave rise to the following problem. The file contains: त्रास ततत जुग नींद ना हा बु Note that the first char 'त्र' is a multiple code point of त + ् + र Now while searching for 'त'…

java unicode character-properties ligature

asked Aug 25 '09 at 13:09

user162703

votes

4 answers

how to use unicode character groups in javascript's regexs?

there is a way to use patterns like "\p{L}" in javascript, natively? (i suppose that is a perl-compatible syntax) I'm interested firstly in firefox support, and webkit, possibly

javascript regex unicode character-properties

asked Jan 21 '12 at 14:28

user652649

votes

3 answers

How can I find out how is a punctuation character form in UTF 8?

I have a set of characters like ., !, ?, ;, (space) and a string, which may or may not be UTF 8 (any language). Is there a easy way to find out if the string has one of the character set above? For example: 这是一个在中国的字符串。 which translates to This is…

php string unicode character-properties

asked Oct 05 '11 at 13:15

Alex

66,732
177
439
641

votes

5 answers

Validating a Unicode Name

In ASCII, validating a name isn't too difficult: just make sure all the characters are alphabetical. But what about in Unicode (utf-8) ? How can I make sure there are no commas or underscores (outside of ASCII scope) in a given string? (ideally in…

python unicode validation character-properties

asked Mar 09 '09 at 15:30

Gilbert

votes

2 answers

Matching a Unicode "name" with a JavaScript Regular Expression

In JavaScript we can match individual Unicode codepoints or codepoint ranges by using the Unicode escape sequences, e.g.: "A".match(/\u0041/) // => ["A"] "B".match(/[\u0041-\u007A]/) // => ["B"] But how could we create a regular expression to match…

javascript regex unicode character-properties

asked Apr 06 '11 at 18:18

maerics

151,642
46
269
291

votes

1 answer

Unicode regexp to match line-breaks?

I have this form from where I want to submit data to a database. The data is UTF8. I am having trouble with matching line breaks. The pattern I am using is something like this: ~^[\p{L}\p{M}\p{N} ]+$~u This pattern works fine until the user puts a…

regex unicode character-properties line-breaks

asked Dec 08 '10 at 14:35

Booya

votes

2 answers

Regex - Unicode Properties Reference and Examples

I feel lost with the Regex Unicode Properties presented by RegexBuddy, I cannot distinguish between any of the Number properties and the Math symbol property only seems to match + but not -, *, /, ^ for instance. Is there any documentation /…

php regex unicode pcre character-properties

asked Jan 14 '10 at 06:17

Alix Axel

151,645
95
393
500

votes

2 answers

Latin char in Javascript regexp

How can i inlude the use of latin chars like ČčĆćŠšĐđ in this javascript regexp var regex = new RegExp('\\b' + this.value, "i"); UPDATE: I have this code for filtering checkbox label, but it doesnt work well when there is an input with Č č…

javascript regex unicode boundary character-properties

asked Jul 30 '13 at 14:27

user2406735

votes

5 answers

Incrementing a character in Java explanation

I have a Java fragment that looks like this: char ch = 'A'; System.out.println("ch = " + ch); which prints: A then when I do this ch++; // increment ch System.out.println("ch =" + ch); it now prints: B I also tried it with Z and…

java character character-properties

asked Jun 25 '13 at 07:09

Þaw

2,047
4
22
39

Prev 1 2 3

5 6 7 Next