Questions tagged [nlp]

Natural language processing (NLP) is a subfield of artificial intelligence that involves transforming or extracting useful information from natural language data. Methods include machine-learning and rule-based approaches.

Natural language processing (NLP) is a subfield of artificial intelligence that involves transforming or extracting useful information from natural language data. Methods include machine-learning and rule-based approaches. It is often regarded as the engineering arm of Computational Linguistics.

NOTE: If you want to use this tag for a question not directly concerning implementation, then consider posting on Data Science, or Artificial Intelligence instead; otherwise you're probably off-topic. Please choose one site only and do not cross-post to more than one - see Is cross-posting a question on multiple Stack Exchange sites permitted if the question is on-topic for each site? (tl;dr: no).

NLP tasks

Beginner books on Natural Language Processing

Popular software packages

20185 questions
6
votes
1 answer

Heuristic Approaches to Finding Main Content

Wondering if anybody could point me in the direction of academic papers or related implementations of heuristic approaches to finding the real meat content of a particular webpage. Obviously this is not a trivial task, since the problem description…
Kevin Dolan
  • 4,952
  • 3
  • 35
  • 47
6
votes
2 answers

Dutch Grammar in python's NLTK

I am working on a Dutch corpus and I want to know if NLTK has dutch grammar embedded in it so I can parse my sentences? In general does NLTK only work on English? I know that it has the Alpino dutch copora, but there is no indication that the…
Hossein
  • 40,161
  • 57
  • 141
  • 175
6
votes
2 answers

How to configure input shape for bidirectional LSTM in Keras

I'm facing the following issue. I have a large number of documents that I want to encode using a bidirectional LSTM. Each document has a different number of words and word can be thought of as a timestep. When configuring the bidirectional LSTM we…
Funzo
  • 1,190
  • 2
  • 14
  • 25
6
votes
1 answer

Extract only body text from arXiv articles formatted as .tex

My dataset is composed of arXiv astrophysics articles as .tex files, and I need to extract only text from the article body, not from any other part of the article (e.g. tables, figures, abstract, title, footnotes, acknowledgements, citations, etc.).…
brienna
  • 1,415
  • 1
  • 18
  • 45
6
votes
1 answer

Keyword/keyphrase extraction from text

I am working on a project where I need to extract "technology related keywords/keyphrases" from text. For example, my text is: "ABC Inc has been working on a project related to machine learning which makes use of the existing libraries for finding…
6
votes
2 answers

Get trouble to load glove 840B 300d vector

It seems the format is, for every line, the string is like 'word number number .....'. So it easy to split it. But when I split them with the script below import numpy as np def loadGloveModel(gloveFile): print "Loading Glove Model" f =…
Linjie Xu
  • 61
  • 1
  • 3
6
votes
1 answer

Doc2vec: Only 10 docvecs in gensim doc2vec model?

I used gensim fit a doc2vec model, with tagged document (length>10) as training data. The target is to get doc vectors of all training docs, but only 10 vectors can be found in model.docvecs. The example of training data (length>10) docs = ['This is…
GemOfRoe
  • 125
  • 5
6
votes
1 answer

Load vectors into gensim Word2Vec model - not KeyedVectors

I'm attempting to load some pre-trained vectors into a gensim Word2Vec model, so they can be retrained with new data. My understanding is I can do the retraining with gensim.Word2Vec.train(). However, the only way I can find to load the vectors is…
Mike S
  • 1,451
  • 1
  • 16
  • 34
6
votes
2 answers

gensim - Word2vec continue training on existing model - AttributeError: 'Word2Vec' object has no attribute 'compute_loss'

I am trying to continue training on an existing model, model = gensim.models.Word2Vec.load('model/corpus.zhwiki.word.model') more_sentences = [['Advanced', 'users', 'can', 'load', 'a', 'model', 'and', 'continue', 'training', 'it', 'with', 'more',…
dididaisy
  • 141
  • 3
  • 10
6
votes
1 answer

Problems with Prolog's DCG

The project is about translating semi-natural language to SQL tables. The code: label(S) --> label_h(C), {atom_codes(A, C), string_to_atom(S, A)}, !. label_h([C|D]) --> letter(C), letters_or_digits(D), !. letters_or_digits([C|D]) -->…
Igor
  • 2,673
  • 5
  • 33
  • 39
6
votes
1 answer

ValueError: operands could not be broadcast together with shapes in Naive bayes classifier

Getting straight to the point: 1) My goal was to apply NLP and Machine learning algorithm to classify a dataset containing sentences into 5 different types of categories(numeric). For e.g. "I want to know details of my order -> 1". Code: import…
6
votes
2 answers

Detect abbreviations in the text in python

I want to find abbreviations in the text and remove it. What I am currently doing is identifying consecutive capital letters and remove them. But I see that it does not remove abbreviations such as MOOCs, M.O.O.C, M.O.O.Cs. Is there an easy way of…
user8871463
6
votes
1 answer

NLTK CoreNLPDependencyParser: Failed to establish connection

I'm trying to use the Stanford Parser through NLTK, following the example here. I follow the first two lines of the example (with the necessary import) from nltk.parse.corenlp import CoreNLPDependencyParser dep_parser =…
mxdg
  • 314
  • 3
  • 14
6
votes
1 answer

NameError: name 'stopwords' is not defined

I'm getting the error NameError: name 'stopwords' is not defined for some reason, even though I have the package installed. I'm trying to do natural language processing on some feedback reviews. The dataset object is a table with two columns,…
james
  • 63
  • 1
  • 1
  • 4
6
votes
1 answer

How to add attention layer to seq2seq model on Keras

Based on this article, I wrote this…
Osm
  • 81
  • 4