Questions tagged [nlp]

Natural language processing (NLP) is a subfield of artificial intelligence that involves transforming or extracting useful information from natural language data. Methods include machine-learning and rule-based approaches.

NOTE: If you want to use this tag for a question not directly concerning implementation, then consider posting on Data Science, or Artificial Intelligence instead; otherwise you're probably off-topic. Please choose one site only and do not cross-post to more than one - see Is cross-posting a question on multiple Stack Exchange sites permitted if the question is on-topic for each site? (tl;dr: no).

NLP tasks

Text pre-processing
Coreference resolution
Dependency parsing parse-tree
Document summarization summarization
Named entity recognition (NER) named-entity-recognition
Information extraction (IE) information-retrieval information-extraction
Language modeling
Part-of-speech (POS) tagging part-of-speech
Morphological analysis and wordform generation
Phrase-structure (constituency) parsing parse-tree
Machine translation (MT) machine-translation
Question answering (QA) nlp-question-answering
Sentiment analysis sentiment-analysis
Semantic parsing semantic-analysis
Text categorization text-classification document-classification
Textual entailment detection
Topic modeling topic-modeling
Word Sense Disambiguation (WSD) word-sense-disambiguation

Beginner books on Natural Language Processing

Popular software packages

General purpose toolkits
- Natural Language Toolkit (NLTK) (Python) nltk
- OpenNLP (Java) opennlp
- Sharp NLP (.NET) sharpnlp
- ClearNLP (Java) clearnlp
- Mate (Java)
- Stanford CoreNLP (Java) stanford-nlp
- Treat (Ruby)
- Mallet (Java) mallet
- spaCy (Python) spacy
- Pattern (Python) python-pattern
Phrase structure parsers
- Stanford Parser (Java) stanford-nlp
- Berkeley Parser (Java)
- BLLIP (Charniak-Johnson) Parser (C++, Python) charniak-parser
Dependency parsers
- Stanford Dependencies (packaged with Stanford parser) (Java) stanford-nlp
- MaltParser (Java)
- MSTParser (Java)
- UDPipe
Proof reading software
- LanguageTool (Java) languagetool

20185 questions

votes

2 answers

Extract grocery list out of free text

I am looking for a python library / algorithm / paper to extract a list of groceries out of free text. For example: "One salad and two beers" Should be converted to: {'salad':1, 'beer': 2}

python nlp nltk

asked Jul 17 '16 at 08:39

Uri Goren

13,386
6
58
110

votes

2 answers

Defining vocabulary size in text classification

I have a question regarding the defining of vocabulary set needed for feature extraction in text classification. In an experiment, there are two approaches I can think of: 1.Define vocabulary size using both training data and test data, so that no…

machine-learning nlp text-classification

asked Jul 02 '16 at 02:44

antande

votes

3 answers

Named Entity Recognition with Syntaxnet

I am trying to understand and learn SyntaxNet. I am trying to figure out whether is there any way to use SyntaxNet for Name Entity Recognition of a corpus. Any sample code or helpful links would be appreciated.

nlp tensorflow syntaxnet

asked Jun 29 '16 at 20:55

Anantha

votes

1 answer

NLTK - Download all nltk data except corpara from command line without Downloader UI

We can download all nltk data using: > import nltk > nltk.download('all') Or specific data using: > nltk.download('punkt') > nltk.download('maxent_treebank_pos_tagger') But I want to download all data except 'corpara' files, for example - all…

python nlp nltk corpus nltk-trainer

asked Jun 25 '16 at 16:46

RAVI

3,143
4
25
38

votes

1 answer

How to get constituency-based parse tree from Parsey McParseface

Parsey McParsey returns a dependency-based parse tree by default, but is their a way to get a constituency-based parse tree from it? EDIT: To clarify, by "to get from it" I mean from the Parsey itself. Though building a tree from ConLL output would…

nlp syntaxnet parsey-mcparseface

asked May 22 '16 at 14:33

maga

votes

3 answers

How can I split at word boundaries with regexes?

I'm trying to do this: import re sentence = "How are you?" print(re.split(r'\b', sentence)) The result being [u'How are you?'] I want something like [u'How', u'are', u'you', u'?']. How can this be achieved?

python regex nlp

asked May 15 '16 at 11:17

oarfish

4,116
4
37
66

votes

3 answers

Regular expression for counting sentences in a block of text

Possible Duplicate: PHP - How to split a paragraph into sentences. I have a block of text that I would like to separate into sentences, what would be the best way of doing this? I thought of looking for '.','!','?' characters, but I realized…

php regex nlp

asked Sep 09 '10 at 15:11

GSto

41,512
37
133
184

votes

1 answer

Name Entity Resolution Algorithm

I was trying to build an entity resolution system, where my entities are, (i) General named entities, that is organization, person, location,date, time, money, and percent. (ii) Some other entities like, product, title of person like president,ceo,…

python algorithm machine-learning nlp

asked Apr 10 '16 at 20:30

Coeus2016

votes

1 answer

Multi-Threaded NLP with Spacy pipe

I'm trying to apply Spacy NLP (Natural Language Processing) pipline to a big text file like Wikipedia Dump. Here is my code based on Spacy's documentation example: from spacy.en import English input = open("big_file.txt") big_text=…

python multithreading nlp pipeline spacy

asked Apr 08 '16 at 21:44

Sajjad Bay

votes

1 answer

Is it possible to returned the analyzed fields in an ElasticSearch >2.0 search?

This question feels very similar to an old question posted here: Retrieve analyzed tokens from ElasticSearch documents, but to see if there are any changes I thought it would make sense to post it again for the latest version of ElasticSearch. We…

elasticsearch lucene nlp

asked Mar 16 '16 at 11:37

luckylwk

votes

1 answer

Using different word2vec training data in spaCy

So I'd like to use some of this training data in spaCy when I use the similarity() method. I'd also like to maybe use the pre-trained vectors also on this page. But the spaCy docs seem lacking here, does anyone know how to do this?

python nlp word2vec spacy

asked Feb 26 '16 at 13:58

Tom Carrick

6,349
13
54
78

votes

1 answer

Intuition behind tf-idf for term extraction

I'm trying to build a dictionary of words using tf-idf. However, intuitively it doesn't make sense. If the inverse document frequency (idf) part of tf-idf calculates the relevance of a term with respect to entire corpus, then that implies some of…

machine-learning nlp tf-idf

asked Feb 17 '16 at 18:57

jCoder

votes

2 answers

How to correct spelling in a Pandas DataFrame

Using the TextBlob library it is possible to improve the spelling of strings by defining them as TextBlob objects first and then using the correct method. Example: from textblob import TextBlob data = TextBlob('Two raods diverrged in a yullow waod…

python pandas nlp textblob

asked Jan 28 '16 at 19:35

RDJ

4,052
9
36
54

votes

1 answer

Why Stanford parser with nltk is not correctly parsing a sentence?

I am using Stanford parser with nltk in python and got help from Stanford Parser and NLTK to set up Stanford nlp libraries. from nltk.parse.stanford import StanfordParser from nltk.parse.stanford import StanfordDependencyParser parser =…

python parsing nlp nltk stanford-nlp

asked Jan 23 '16 at 20:52

Nomiluks

2,052
5
31
53

votes

1 answer

Result Difference in Stanford NER tagger NLTK (python) vs JAVA

I am using both python and java to run the Stanford NER tagger but I am seeing the difference in the results. For example, when I input the sentence "Involved in all aspects of data modeling using ERwin as the primary software for this.", JAVA…

python nlp nltk stanford-nlp named-entity-recognition

asked Jan 06 '16 at 05:56

aerin

20,607
28
102
140

Prev 1 2 3

…

99 100 Next