Highest Voted 'text-processing' Questions

12

votes

1 answer

Negation handling in NLP

I'm currently working on a project, where I want to extract emotion from text. As I'm using conceptnet5 (a semantic network), I can't however simply prefix words in a sentence that contains a negation-word, as those words would simply not show up in…

python regex nlp nltk text-processing

asked Feb 25 '15 at 13:24

Tim Daubenschütz

2,053
6
23
39

12

votes

7 answers

Python: Best Way to remove duplicate character from string

How can I remove duplicate characters from a string using Python? For example, let's say I have a string: foo = "SSYYNNOOPPSSIISS" How can I make the string: foo = SYNOPSIS I'm new to python and What I have tired and it's working. I knew there is…

python string text-processing

asked Sep 14 '13 at 06:30

Rahul Patil

1,014
3
14
30

11

votes

2 answers

How To Use Backreference in Grep

I have a regular expression with a backreference. How can use it in a bash script? Such as I want to print what matches to (.*) grep -E "CONSTRAINT \`(.*)\` FOREIGN KEY" temp.txt If apply it to CONSTRAINT `fk_dm` FOREIGN KEY I want to…

regex unix grep text-processing

asked Jan 11 '12 at 11:50

metdos

13,411
17
77
120

11

votes

4 answers

Java text classification problem

I have a set of Books objects, classs Book is defined as following : Class Book{ String title; ArrayList taglist; } Where title is the title of the book, example : Javascript for dummies. and taglist is a list of tags for our example :…

java machine-learning nlp text-processing classification

asked May 12 '10 at 18:16

Youssef

1,310
1
14
24

11

votes

2 answers

Nltk stanford pos tagger error : Java command failed

I'm trying to use nltk.tag.stanford module for tagging a sentence (first like wiki's example) but i keep getting the following error : Traceback (most recent call last): File "test.py", line 28, in print st.tag(word_tokenize('What is…

python nlp nltk stanford-nlp text-processing

asked Nov 27 '14 at 12:54

Mazdak

105,000
18
159
188

11

votes

5 answers

Extract words surrounding a search word

I have this script that does a word search in text. The search goes pretty good and results work as expected. What I'm trying to achieve is extract n words close to the match. For example: The world is a small place, we should try to take care of…

python regex find text-processing

asked Jul 15 '13 at 01:56

PepperoniPizza

8,842
9
58
100

11

votes

1 answer

Python: PyEnchant and 64 bit Python

I am doing text processing. I need the PyEnchant library for verifying if a particular word in the text is a valid English word. However, it's only available for the 32 bit installation of Python. I need the 64 bit Python for handling memory issues…

python text-processing pyenchant

asked Dec 21 '12 at 20:55

user1839897

425
1
10
14

11

votes

1 answer

Effects of Stemming on the term frequency?

How are the term frequencies (TF), and inverse document frequency (IDF), affected by stop-word removal and stemming? Thanks!

data-mining text-processing tf-idf stop-words stemming

asked May 05 '12 at 17:29

Ataman

2,530
3
22
34

10

votes

2 answers

Using Keras Tokenizer to generate n-grams

Is it possible to use n-grams in Keras? E.g., sentences contain in X_train dataframe with "sentences" column. I use tokenizer from Keras in the following manner: tokenizer = Tokenizer(lower=True, split='…

nlp keras tokenize text-processing n-gram

asked Sep 12 '17 at 10:02

Simplex

1,723
2
17
26

10

votes

3 answers

What is the difference between fit_transform and transform in sklearn countvectorizer?

I was recently practicing bag of words introduction : kaggle , I want to clear few things : using vectorizer.fit_transform( " * on the list of *cleaned* reviews* " ) Now when we were preparing the bag of words array on train reviews we used…

python scikit-learn tokenize text-processing

asked Aug 01 '16 at 06:46

Anurag Pandey

373
2
5
21

10

votes

1 answer

Using Stanford NER for extracting Address from a text document?

I was looking Stanford NER and thinking of using JAVA Apis it to extract postal address from a text document. The document may be any document where there is an postal address section e.g. Utility Bills, electricity bills. So what I am thinking as…

java stanford-nlp text-processing

asked Dec 22 '15 at 04:16

yadab

2,063
1
16
24

10

votes

1 answer

Extract emoticons from a text

I need to extract text emoticons from a text using Python and I've been looking for some solutions to do this but most of them like this or this only cover simple emoticons. I need to parse all of them. Currently I'm using a list of emoticons that I…

python regex text-processing emoticons

asked May 21 '15 at 10:22

David Moreno García

4,423
8
49
82

10

votes

1 answer

Given a document, select a relevant snippet

When I ask a question here, the tool tips for the question returned by the auto search given the first little bit of the question, but a decent percentage of them don't give any text that is any more useful for understanding the question than the…

statistics nlp text-processing heuristics

asked May 13 '10 at 18:30

BCS

75,627
68
187
294

10

votes

1 answer

Which function should I use to read unstructured text file into R?

This is my first ever question here and I'm new to R, trying to figure out my first step in how to do data processing, please keep it easy : ) I'm wondering what would be the best function and a useful data structure in R to load unstructured text…

r text-processing file-read readlines

asked Oct 31 '13 at 19:05

user2942656

117
1
1
6

10

votes

10 answers

Finding dictionary words

I have a lot of compound strings that are a combination of two or three English words. e.g. "Spicejet" is a combination of the words "spice" and "jet" I need to separate these individual English words from such compound strings. My dictionary…

algorithm data-structures text-processing

asked Aug 18 '09 at 04:04

Manas

589
8
18

Questions tagged [text-processing]