Questions tagged [nlp]

Natural language processing (NLP) is a subfield of artificial intelligence that involves transforming or extracting useful information from natural language data. Methods include machine-learning and rule-based approaches.

NOTE: If you want to use this tag for a question not directly concerning implementation, then consider posting on Data Science, or Artificial Intelligence instead; otherwise you're probably off-topic. Please choose one site only and do not cross-post to more than one - see Is cross-posting a question on multiple Stack Exchange sites permitted if the question is on-topic for each site? (tl;dr: no).

NLP tasks

Text pre-processing
Coreference resolution
Dependency parsing parse-tree
Document summarization summarization
Named entity recognition (NER) named-entity-recognition
Information extraction (IE) information-retrieval information-extraction
Language modeling
Part-of-speech (POS) tagging part-of-speech
Morphological analysis and wordform generation
Phrase-structure (constituency) parsing parse-tree
Machine translation (MT) machine-translation
Question answering (QA) nlp-question-answering
Sentiment analysis sentiment-analysis
Semantic parsing semantic-analysis
Text categorization text-classification document-classification
Textual entailment detection
Topic modeling topic-modeling
Word Sense Disambiguation (WSD) word-sense-disambiguation

Beginner books on Natural Language Processing

Popular software packages

General purpose toolkits
- Natural Language Toolkit (NLTK) (Python) nltk
- OpenNLP (Java) opennlp
- Sharp NLP (.NET) sharpnlp
- ClearNLP (Java) clearnlp
- Mate (Java)
- Stanford CoreNLP (Java) stanford-nlp
- Treat (Ruby)
- Mallet (Java) mallet
- spaCy (Python) spacy
- Pattern (Python) python-pattern
Phrase structure parsers
- Stanford Parser (Java) stanford-nlp
- Berkeley Parser (Java)
- BLLIP (Charniak-Johnson) Parser (C++, Python) charniak-parser
Dependency parsers
- Stanford Dependencies (packaged with Stanford parser) (Java) stanford-nlp
- MaltParser (Java)
- MSTParser (Java)
- UDPipe
Proof reading software
- LanguageTool (Java) languagetool

20185 questions

votes

1 answer

NLP - Embeddings selection of `start` and `end` of sentence tokens

Suppose we're training a neural network model to learn the mapping from the following input to output, where the output is Name Entity (NE). Input: EU rejects German call to boycott British lamb . Output: ORG O MISC O O O MISC O O A sliding window…

machine-learning nlp deep-learning word2vec word-embedding

asked Nov 07 '17 at 00:51

GabrielChu

6,026
10
27
42

votes

3 answers

Spacy Japanese Tokenizer

I am trying to use Spacy's Japanese tokenizer. import spacy Question= 'すぺいんへいきました。' nlp(Question.decode('utf8')) I am getting the below error, TypeError: Expected unicode, got spacy.tokens.token.Token Any ideas on how to fix this? Thanks!

python nlp spacy cjk

asked Nov 01 '17 at 11:22

AKSHAYAA VAIDYANATHAN

2,715
7
30
51

votes

6 answers

Natural language parser for dates (.NET)?

I want to be able to let users enter dates (including recurring dates) using natural language (eg "next friday", "every weekday"). Much like the examples at http://todoist.com/Help/timeInsert I found this post, but it's a bit old and offered only…

.net datetime ironpython nlp

asked Jan 21 '09 at 20:48

Crescent Fresh

115,249
25
154
140

votes

1 answer

How do I solve the following error?Input must be a character vector of any length or a list of character vectors, each of which has a length of 1

I am working on a R project. The data set I used is available at the following link https://www.kaggle.com/ranjitha1/hotel-reviews-city-chennai/data The code I have used is. df1 = read.csv("chennai.csv", header = TRUE) library(tidytext) tidy_books…

r nlp sentiment-analysis

asked Sep 21 '17 at 10:42

Varun Raghav B

votes

1 answer

NLTK was unable to find the java file! for Stanford POS Tagger

I have been stuck trying to get the Stanford POS Tagger to work for a while. From an old SO post I found the following (slightly modified) code: stanford_dir = 'C:/Users/.../stanford-postagger-2017-06-09/' from nltk.tag import…

python nlp nltk stanford-nlp

asked Sep 13 '17 at 16:00

jss367

4,759
14
54
76

votes

2 answers

How to filter tokens from spaCy document

I would like to parse a document using spaCy and apply a token filter so that the final spaCy document does not include the filtered tokens. I know that I can take the sequence of tokens filtered, but I am insterested in having the actual Doc…

python nlp spacy

asked Jul 28 '17 at 14:02

Kon Pal

votes

1 answer

State-of-the-art method for large-scale near-duplicate detection of documents?

To my understanding, the scientific consensus in NLP is that the most effective method for near-duplicate detection in large-scale scientific document collections (more than 1 billion documents) is the one found here:…

machine-learning nlp

asked Jun 04 '17 at 14:13

Alex

votes

1 answer

Which database can be used to store processed data from NLP engine

I am looking at taking unstructured data in the form of files, processing it and storing it in a database for retrieval. The data will be in natural language and the queries to get information will also be in natural language. Ex: the data could be…

mysql database nlp information-retrieval information-extraction

asked May 24 '17 at 07:55

Swati Pardeshi

votes

1 answer

Automatic labeling of LDA generated topics

I'm trying to categorize customer feedback and I ran an LDA in python and got the following output for 10 topics: (0, u'0.559*"delivery" + 0.124*"area" + 0.018*"mile" + 0.016*"option" + 0.012*"partner" + 0.011*"traffic" + 0.011*"hub" +…

python nlp lda topic-modeling labeling

asked May 15 '17 at 17:41

Arman

votes

1 answer

Load Custom NER Model Stanford CoreNLP

I have created my own NER model with Stanford's "Stanford-NER" software and by following these directions. I am aware that CoreNLP loads three NER models out of the box in the following…

java python python-3.x nlp stanford-nlp

asked May 12 '17 at 16:29

Fraizier Reiland

votes

1 answer

Natural language processing library for auto-tagging (.NET)

Dose anyone know of any good libraries out there for .NET that could help pull keywords out of blocks of natural language. I'm basically trying to strip out stop words and ignore tenses, plurals and generally find words that are essentially the…

c# .net parsing nlp

asked Dec 07 '10 at 16:40

Ben

1,767
16
32

votes

1 answer

Add word embedding to word2vec gensim model

I'm looking for a way to dinamically add pre-trained word vectors to a word2vec gensim model. I have a pre-trained word2vec model in a txt (words and their embedding) and I need to get Word Mover's Distance (for example via…

python nlp word2vec

asked Apr 24 '17 at 21:43

eardil

votes

1 answer

python charmap codec can't decode byte X in position Y character maps to

I'm experimenting with python libraries for data analysis,the problem i'm facing is this exception UnicodeDecodeError was unhandled by user code Message: 'charmap' codec can't decode byte 0x81 in position 165: character maps to < undefined> I…

python python-3.x unicode nlp python-unicode

asked Mar 21 '17 at 05:31

DayTimeCoder

4,294
5
38
61

votes

1 answer

Using predict on new text with kmeans (sklearn)?

I have a very small list of short strings which I want to (1) cluster and (2) use that model to predict which cluster a new string belongs to. Running the first part works fine, getting a prediction for the new string does not. First Part from…

python-3.x scikit-learn nlp k-means

asked Mar 16 '17 at 05:00

Itay Livni

2,143
24
38

votes

1 answer

Tensorflow : ValueError: Shape must be rank 2 but is rank 3

I'm new to tensorflow and I'm trying to update some code for a bidirectional LSTM from an old version of tensorflow to the newest (1.0), but I get this error: Shape must be rank 2 but is rank 3 for 'MatMul_3' (op: 'MatMul') with input shapes:…

python tensorflow nlp lstm bidirectional

asked Mar 06 '17 at 09:15

D. Clem

Prev 1 2 3

…

99 100 Next