Questions tagged [nlp]

Natural language processing (NLP) is a subfield of artificial intelligence that involves transforming or extracting useful information from natural language data. Methods include machine-learning and rule-based approaches.

NOTE: If you want to use this tag for a question not directly concerning implementation, then consider posting on Data Science, or Artificial Intelligence instead; otherwise you're probably off-topic. Please choose one site only and do not cross-post to more than one - see Is cross-posting a question on multiple Stack Exchange sites permitted if the question is on-topic for each site? (tl;dr: no).

NLP tasks

Text pre-processing
Coreference resolution
Dependency parsing parse-tree
Document summarization summarization
Named entity recognition (NER) named-entity-recognition
Information extraction (IE) information-retrieval information-extraction
Language modeling
Part-of-speech (POS) tagging part-of-speech
Morphological analysis and wordform generation
Phrase-structure (constituency) parsing parse-tree
Machine translation (MT) machine-translation
Question answering (QA) nlp-question-answering
Sentiment analysis sentiment-analysis
Semantic parsing semantic-analysis
Text categorization text-classification document-classification
Textual entailment detection
Topic modeling topic-modeling
Word Sense Disambiguation (WSD) word-sense-disambiguation

Beginner books on Natural Language Processing

Popular software packages

General purpose toolkits
- Natural Language Toolkit (NLTK) (Python) nltk
- OpenNLP (Java) opennlp
- Sharp NLP (.NET) sharpnlp
- ClearNLP (Java) clearnlp
- Mate (Java)
- Stanford CoreNLP (Java) stanford-nlp
- Treat (Ruby)
- Mallet (Java) mallet
- spaCy (Python) spacy
- Pattern (Python) python-pattern
Phrase structure parsers
- Stanford Parser (Java) stanford-nlp
- Berkeley Parser (Java)
- BLLIP (Charniak-Johnson) Parser (C++, Python) charniak-parser
Dependency parsers
- Stanford Dependencies (packaged with Stanford parser) (Java) stanford-nlp
- MaltParser (Java)
- MSTParser (Java)
- UDPipe
Proof reading software
- LanguageTool (Java) languagetool

20185 questions

votes

1 answer

Heuristic Approaches to Finding Main Content

Wondering if anybody could point me in the direction of academic papers or related implementations of heuristic approaches to finding the real meat content of a particular webpage. Obviously this is not a trivial task, since the problem description…

parsing nlp web-crawler

asked Feb 17 '11 at 05:31

Kevin Dolan

4,952
3
35
47

votes

2 answers

Dutch Grammar in python's NLTK

I am working on a Dutch corpus and I want to know if NLTK has dutch grammar embedded in it so I can parse my sentences? In general does NLTK only work on English? I know that it has the Alpino dutch copora, but there is no indication that the…

python parsing nlp nltk context-free-grammar

asked Feb 14 '11 at 10:12

Hossein

40,161
57
141
175

votes

2 answers

How to configure input shape for bidirectional LSTM in Keras

I'm facing the following issue. I have a large number of documents that I want to encode using a bidirectional LSTM. Each document has a different number of words and word can be thought of as a timestep. When configuring the bidirectional LSTM we…

machine-learning nlp keras lstm recurrent-neural-network

asked Apr 14 '18 at 17:14

Funzo

1,190
2
14
25

votes

1 answer

Extract only body text from arXiv articles formatted as .tex

My dataset is composed of arXiv astrophysics articles as .tex files, and I need to extract only text from the article body, not from any other part of the article (e.g. tables, figures, abstract, title, footnotes, acknowledgements, citations, etc.).…

python nlp latex extract tex

asked Apr 11 '18 at 16:08

brienna

1,415
1
18
45

votes

1 answer

Keyword/keyphrase extraction from text

I am working on a project where I need to extract "technology related keywords/keyphrases" from text. For example, my text is: "ABC Inc has been working on a project related to machine learning which makes use of the existing libraries for finding…

machine-learning nlp text-mining jnlp text-extraction

asked Mar 13 '18 at 18:28

Surbhi Singh

votes

2 answers

Get trouble to load glove 840B 300d vector

It seems the format is, for every line, the string is like 'word number number .....'. So it easy to split it. But when I split them with the script below import numpy as np def loadGloveModel(gloveFile): print "Loading Glove Model" f =…

python nlp stanford-nlp word2vec

asked Mar 03 '18 at 11:54

Linjie Xu

votes

1 answer

Doc2vec: Only 10 docvecs in gensim doc2vec model?

I used gensim fit a doc2vec model, with tagged document (length>10) as training data. The target is to get doc vectors of all training docs, but only 10 vectors can be found in model.docvecs. The example of training data (length>10) docs = ['This is…

machine-learning nlp word2vec gensim doc2vec

asked Feb 28 '18 at 03:14

GemOfRoe

votes

1 answer

Load vectors into gensim Word2Vec model - not KeyedVectors

I'm attempting to load some pre-trained vectors into a gensim Word2Vec model, so they can be retrained with new data. My understanding is I can do the retraining with gensim.Word2Vec.train(). However, the only way I can find to load the vectors is…

machine-learning nlp word2vec gensim word-embedding

asked Feb 08 '18 at 16:35

Mike S

1,451
1
16
34

votes

2 answers

gensim - Word2vec continue training on existing model - AttributeError: 'Word2Vec' object has no attribute 'compute_loss'

I am trying to continue training on an existing model, model = gensim.models.Word2Vec.load('model/corpus.zhwiki.word.model') more_sentences = [['Advanced', 'users', 'can', 'load', 'a', 'model', 'and', 'continue', 'training', 'it', 'with', 'more',…

python nlp word2vec gensim

asked Jan 25 '18 at 13:28

dididaisy

votes

1 answer

Problems with Prolog's DCG

The project is about translating semi-natural language to SQL tables. The code: label(S) --> label_h(C), {atom_codes(A, C), string_to_atom(S, A)}, !. label_h([C|D]) --> letter(C), letters_or_digits(D), !. letters_or_digits([C|D]) -->…

prolog nlp grammar dcg

asked Jan 28 '11 at 10:49

Igor

2,673
5
33
39

votes

1 answer

ValueError: operands could not be broadcast together with shapes in Naive bayes classifier

Getting straight to the point: 1) My goal was to apply NLP and Machine learning algorithm to classify a dataset containing sentences into 5 different types of categories(numeric). For e.g. "I want to know details of my order -> 1". Code: import…

python machine-learning nlp classification naivebayes

asked Jan 08 '18 at 16:00

Shikhar Thapliyal

votes

2 answers

Detect abbreviations in the text in python

I want to find abbreviations in the text and remove it. What I am currently doing is identifying consecutive capital letters and remove them. But I see that it does not remove abbreviations such as MOOCs, M.O.O.C, M.O.O.Cs. Is there an easy way of…

python nlp

asked Dec 10 '17 at 01:08

user8871463

votes

1 answer

NLTK CoreNLPDependencyParser: Failed to establish connection

I'm trying to use the Stanford Parser through NLTK, following the example here. I follow the first two lines of the example (with the necessary import) from nltk.parse.corenlp import CoreNLPDependencyParser dep_parser =…

python nlp nltk stanford-nlp

asked Dec 01 '17 at 00:15

mxdg

votes

1 answer

NameError: name 'stopwords' is not defined

I'm getting the error NameError: name 'stopwords' is not defined for some reason, even though I have the package installed. I'm trying to do natural language processing on some feedback reviews. The dataset object is a table with two columns,…

python nlp stop-words

asked Nov 25 '17 at 11:56

james

votes

1 answer

How to add attention layer to seq2seq model on Keras

Based on this article, I wrote this…

nlp deep-learning keras lstm attention-model

asked Nov 08 '17 at 09:25

Osm

Prev 1 2 3

…

99 100 Next