Questions tagged [nlp]

Natural language processing (NLP) is a subfield of artificial intelligence that involves transforming or extracting useful information from natural language data. Methods include machine-learning and rule-based approaches.

NOTE: If you want to use this tag for a question not directly concerning implementation, then consider posting on Data Science, or Artificial Intelligence instead; otherwise you're probably off-topic. Please choose one site only and do not cross-post to more than one - see Is cross-posting a question on multiple Stack Exchange sites permitted if the question is on-topic for each site? (tl;dr: no).

NLP tasks

Text pre-processing
Coreference resolution
Dependency parsing parse-tree
Document summarization summarization
Named entity recognition (NER) named-entity-recognition
Information extraction (IE) information-retrieval information-extraction
Language modeling
Part-of-speech (POS) tagging part-of-speech
Morphological analysis and wordform generation
Phrase-structure (constituency) parsing parse-tree
Machine translation (MT) machine-translation
Question answering (QA) nlp-question-answering
Sentiment analysis sentiment-analysis
Semantic parsing semantic-analysis
Text categorization text-classification document-classification
Textual entailment detection
Topic modeling topic-modeling
Word Sense Disambiguation (WSD) word-sense-disambiguation

Beginner books on Natural Language Processing

Popular software packages

General purpose toolkits
- Natural Language Toolkit (NLTK) (Python) nltk
- OpenNLP (Java) opennlp
- Sharp NLP (.NET) sharpnlp
- ClearNLP (Java) clearnlp
- Mate (Java)
- Stanford CoreNLP (Java) stanford-nlp
- Treat (Ruby)
- Mallet (Java) mallet
- spaCy (Python) spacy
- Pattern (Python) python-pattern
Phrase structure parsers
- Stanford Parser (Java) stanford-nlp
- Berkeley Parser (Java)
- BLLIP (Charniak-Johnson) Parser (C++, Python) charniak-parser
Dependency parsers
- Stanford Dependencies (packaged with Stanford parser) (Java) stanford-nlp
- MaltParser (Java)
- MSTParser (Java)
- UDPipe
Proof reading software
- LanguageTool (Java) languagetool

20185 questions

votes

9 answers

Machine Learning and Natural Language Processing

Assume you know a student who wants to study Machine Learning and Natural Language Processing. What specific computer science subjects should they focus on and which programming languages are specifically designed to solve these types of problems? I…

math machine-learning nlp

asked Feb 09 '10 at 23:54

Stephano

5,716
7
41
57

votes

1 answer

How to extract phrases from corpus using gensim

For preprocessing the corpus I was planing to extarct common phrases from the corpus, for this I tried using Phrases model in gensim, I tried below code but it's not giving me desired output. My code from gensim.models import Phrases documents =…

python nlp gensim

asked Mar 01 '16 at 06:30

Prashant Puri

2,324
1
15
21

votes

4 answers

Use of PunktSentenceTokenizer in NLTK

I am learning Natural Language Processing using NLTK. I came across the code using PunktSentenceTokenizer whose actual use I cannot understand in the given code. The code is given : import nltk from nltk.corpus import state_union from nltk.tokenize…

python nlp nltk

asked Feb 08 '16 at 16:55

arqam

3,582
5
34
69

votes

9 answers

Restore original text from Keras’s imdb dataset

Restore original text from Keras’s imdb dataset I want to restore imdb’s original text from Keras’s imdb dataset. First, when I load Keras’s imdb dataset, it returned sequence of word index. >>> (X_train, y_train), (X_test, y_test) =…

python machine-learning neural-network nlp keras

asked Mar 15 '17 at 21:49

Hironsan

votes

10 answers

What are good starting points for someone interested in natural language processing?

Question So I've recently came up with some new possible projects that would have to deal with deriving 'meaning' from text submitted and generated by users. Natural language processing is the field that deals with these kinds of issues, and after…

nlp dcg

asked Oct 17 '08 at 13:52

kitsune

11,516
13
57
78

votes

5 answers

NLTK and language detection

How do I detect what language a text is written in using NLTK? The examples I've seen use nltk.detect, but when I've installed it on my mac, I cannot find this package.

python nlp nltk detection

asked Jul 05 '10 at 21:30

niklassaers

8,480
20
99
146

votes

2 answers

How to connect Cortana commands to custom scripts?

This may be a little early to ask this, but I'm running Windows 10 Technical Preview Build 10122. I'd like to set up Cortana to have custom commands. Here's how she works: Hey Cortana, Microsoft will process…

scripting nlp windows-10 cortana

asked May 25 '15 at 05:21

Charles Clayton

17,005
11
87
120

votes

4 answers

NLTK WordNet Lemmatizer: Shouldn't it lemmatize all inflections of a word?

I'm using the NLTK WordNet Lemmatizer for a Part-of-Speech tagging project by first modifying each word in the training corpus to its stem (in place modification), and then training only on the new corpus. However, I found that the lemmatizer is not…

python nlp nltk

asked Aug 27 '14 at 18:10

sanjeev mk

4,276
6
44
69

votes

4 answers

How to tweak the NLTK sentence tokenizer

I'm using NLTK to analyze a few classic texts and I'm running in to trouble tokenizing the text by sentence. For example, here's what I get for a snippet from Moby Dick: import nltk sent_tokenize =…

python nlp nltk

asked Dec 30 '12 at 23:59

Chris Wilson

6,599
8
35
71

votes

7 answers

How do I do dependency parsing in NLTK?

Going through the NLTK book, it's not clear how to generate a dependency tree from a given sentence. The relevant section of the book: sub-chapter on dependency grammar gives an example figure but it doesn't show how to parse a sentence to come up…

python nlp grammar nltk

asked Sep 16 '11 at 10:26

MrD

2,405
3
22
23

votes

5 answers

Computational Complexity of Self-Attention in the Transformer Model

I recently went through the Transformer paper from Google Research describing how self-attention layers could completely replace traditional RNN-based sequence encoding layers for machine translation. In Table 1 of the paper, the authors compare the…

machine-learning deep-learning neural-network nlp artificial-intelligence

asked Jan 13 '21 at 13:47

Newton

votes

22 answers

Code Golf: Number to Words

The code golf series seem to be fairly popular. I ran across some code that converts a number to its word representation. Some examples would be (powers of 2 for programming fun): 2 -> Two 1024 -> One Thousand Twenty Four 1048576 -> One Million…

language-agnostic nlp code-golf rosetta-stone

asked Nov 21 '08 at 19:25

Jason Z

13,122
15
50
62

votes

7 answers

Is there a natural language parser for date/times in javascript?

javascript datetime nlp

asked Jun 16 '09 at 18:54

antony.trupe

10,640
10
57
84

votes

8 answers

Efficiently count word frequencies in python

I'd like to count frequencies of all words in a text file. >>> countInFile('test.txt') should return {'aaa':1, 'bbb': 2, 'ccc':1} if the target text file is like: # test.txt aaa bbb ccc bbb I've implemented it with pure python following some…

python nlp scikit-learn word-count frequency-distribution

asked Mar 08 '16 at 01:52

Light Yagmi

5,085
12
43
64

votes

6 answers

How best to parse a simple grammar?

Ok, so I've asked a bunch of smaller questions about this project, but I still don't have much confidence in the designs I'm coming up with, so I'm going to ask a question on a broader scale. I am parsing pre-requisite descriptions for a course…

python parsing nlp pyparsing ply

asked May 31 '10 at 18:36

Nick Heiner

119,074
188
476
699

Prev 1 2 3

…

99 100 Next