Highest Voted 'text-normalization' Questions

51

votes

2 answers

Programatic Accent Reduction in JavaScript (aka text normalization or unaccenting)

I need to compare 2 strings as equal such as these: Lubeck == Lübeck In JavaScript. Why? Well, I have an auto-completion field that's going out to a Java service using Lucene, where place names are stored naturally (as Lübeck), but also indexed as…

asked Oct 22 '08 at 23:48

dlamblin

43,965
20
101
140

8

votes

2 answers

How do I properly implement Unicode passwords?

Adding support for Unicode passwords it an important feature that should not be ignored by developers. Still, adding support for Unicode in passwords is a tricky job because the same text can be encoded in different ways in Unicode and you don't…

unicode passwords normalization unicode-normalization text-normalization

asked May 09 '10 at 19:03

sorin

161,544
178
535
806

6

votes

1 answer

Which form of unicode normalization is appropriate for text mining?

I've been reading a lot on the subject of Unicode, but I remain very confused about normalization and its different forms. In short, I am working on a project that involves extracting text from PDF files and performing some semantic text…

python unicode normalization unicode-normalization text-normalization

asked Jun 27 '12 at 19:05

Louis Thibault

20,240
25
83
152

5

votes

1 answer

tackle different types of utf hyphens in ruby 1.8.7

We have different types of hyphens/dashes (in some text) populated in db. Before comparing them with some user input text, i have to normalize any type of dashes/hyphens to simple hyphen/minus (ascii 45). The possible dashes we have to convert are:…

ruby-on-rails ruby unicode text-normalization

asked Oct 01 '10 at 05:51

intellidiot

11,108
4
34
41

5

votes

0 answers

Unicode normalization in GWT

Possible Duplicate: Replace éàçè… with equivalent “eace” In GWT Is there some library I can use to make unicode normalization operations in gwt? (to contextually guarantee that the latin O is equal to the Cyrillic O, for instance)

gwt unicode normalization unicode-normalization text-normalization

asked Apr 26 '12 at 15:12

M. F.

73
4

3

votes

0 answers

Why does NFKC normalization lose superscript & subscript info?

I notice that when normalizing a Unicode string to NFKC form, superscript characters like ¹ (U+00B9), ² (U+00B2), ³ (U+00B3), etc are converted to the corresponding ASCII digit (ex. 1, 2, 3, etc). Does anyone know the rationale for this behavior? …

unicode text-normalization

asked Apr 26 '18 at 21:09

codesniffer

1,033
9
22

3

votes

2 answers

How do I capture items from StringScanner?

I am using Ruby's StringScanner to normalize some English text. def normalize text s = '' ss = StringScanner.new text while ! ss.eos? do s += ' ' if ss.scan(/\s+/) # mutiple whitespace => single space s += 'mice' if…

ruby normalization text-normalization

asked Nov 14 '13 at 22:11

zhon

1,610
1
22
31

2

votes

0 answers

QWebView::findText doesn't work with Unicode’s Combining Diacritical Marks

I’m using QtWebKit (QWebView) to display text, and I want to implement a search functionality in it via QWebView::findText. Problem is that the text that has to be displayed contains so-called Unicode’s Combining Diacritical Marks, and both…

unicode normalization qtwebkit unicode-normalization text-normalization

asked Aug 01 '12 at 10:20

Linas Valiukas

1,316
1
13
23

2

votes

2 answers

Normalizing text file from abnormal newlines?

I have several text files that have lots of newlines between texts that I would like to normalize but there is no pattern to amount of newline between the texts for example: Text Some text More text More more So what I wanted to…

c# .net-4.0 newline normalization text-normalization

asked May 13 '12 at 13:00

Guapo

3,446
9
36
63

1

vote

1 answer

Expanding abbreviations using regex

I have a dictionary of abbreviations, I would like to expand. I would like to use these to go through a text and expand all abbreviations. The defined dictionary is as follows: contractions_dict = { "kl\.": "klokken", } The text I…

regex text-normalization

asked Feb 08 '23 at 15:36

Kiri

55
4

1

vote

1 answer

How to normalize text with regex?

How to normilize text with regex with some if statements? If we have string like this One T933 two, three35.4. four 9,3 8.5 five M2x13 M4.3x2.1 And I want to normilize like this one t 933 two three 35.4 four 9,3 8.5 five m2x13 m4.3x2.1 Remove all…

python regex text-normalization

asked Jul 26 '22 at 12:43

Dmiich

325
2
16

1

vote

1 answer

What is the best way to search for an exact match using Postgres full-text search?

I have a Postgres database with around 1.5 million records. In my Ruby on Rails app, I need to search the statement_text field (which can contain anywhere from 1 to hundreds of words). My problem: I know I can use the pgSearch gem to create scopes…

ruby-on-rails postgresql full-text-search text-normalization

asked Apr 11 '16 at 18:09

jayp

192
2
13

1

vote

1 answer

String normalization in Neo4j Cypher - how to?

Problem background: Chinese words consists of characters which are words themselves. I have 3 nodes representing Chinese words each with the attribute word having the string-values: node (1): "a" node (2): "b" node (3): "ab" Question 1: Using…

parsing neo4j cypher normalization text-normalization

asked Aug 21 '13 at 08:03

Mika

11
1

0

votes

0 answers

Text Normalization for abbreviations, acronym and any other shortcut written in english

I want to predict some typo shortcuts. For example: 8 in. micrometer has to be predicted as 8 inch micrometer 9 lbs Bag - 9 pounds bag 10" scale - 10 inch scale 10 no. - 10 numbers 77 mm length - 77 millimeter length and so on. I already created a…

machine-learning data-science random-forest prediction text-normalization

asked Apr 05 '23 at 07:10

SRI PRIYA

21
1

0

votes

2 answers

Normalize vector such that sum equals 1, while satisfying a lower bound

Given a lower bound of 0.025, I want a vector consisting of weights that sum up to 1 and satisfy this lower bound. Starting from a vector with an arbitrary length and the values ranging from 0.025 (lower bound) to 1. For example, [0.025, 0.8,…

python optimization lower-bound text-normalization

asked Sep 30 '21 at 20:02

Jaques duBalzac

3
2

Questions tagged [text-normalization]