Questions tagged [nlp]

Natural language processing (NLP) is a subfield of artificial intelligence that involves transforming or extracting useful information from natural language data. Methods include machine-learning and rule-based approaches.

NOTE: If you want to use this tag for a question not directly concerning implementation, then consider posting on Data Science, or Artificial Intelligence instead; otherwise you're probably off-topic. Please choose one site only and do not cross-post to more than one - see Is cross-posting a question on multiple Stack Exchange sites permitted if the question is on-topic for each site? (tl;dr: no).

NLP tasks

Text pre-processing
Coreference resolution
Dependency parsing parse-tree
Document summarization summarization
Named entity recognition (NER) named-entity-recognition
Information extraction (IE) information-retrieval information-extraction
Language modeling
Part-of-speech (POS) tagging part-of-speech
Morphological analysis and wordform generation
Phrase-structure (constituency) parsing parse-tree
Machine translation (MT) machine-translation
Question answering (QA) nlp-question-answering
Sentiment analysis sentiment-analysis
Semantic parsing semantic-analysis
Text categorization text-classification document-classification
Textual entailment detection
Topic modeling topic-modeling
Word Sense Disambiguation (WSD) word-sense-disambiguation

Beginner books on Natural Language Processing

Popular software packages

General purpose toolkits
- Natural Language Toolkit (NLTK) (Python) nltk
- OpenNLP (Java) opennlp
- Sharp NLP (.NET) sharpnlp
- ClearNLP (Java) clearnlp
- Mate (Java)
- Stanford CoreNLP (Java) stanford-nlp
- Treat (Ruby)
- Mallet (Java) mallet
- spaCy (Python) spacy
- Pattern (Python) python-pattern
Phrase structure parsers
- Stanford Parser (Java) stanford-nlp
- Berkeley Parser (Java)
- BLLIP (Charniak-Johnson) Parser (C++, Python) charniak-parser
Dependency parsers
- Stanford Dependencies (packaged with Stanford parser) (Java) stanford-nlp
- MaltParser (Java)
- MSTParser (Java)
- UDPipe
Proof reading software
- LanguageTool (Java) languagetool

20185 questions

votes

5 answers

tag generation from a text content

I am curious if there is an algorithm/method exists to generate keywords/tags from a given text, by using some weight calculations, occurrence ratio or other tools. Additionally, I will be grateful if you point any Python based solution / library…

python tags machine-learning nlp nltk

asked Apr 18 '10 at 09:39

Hellnar

62,315
79
204
279

votes

14 answers

Load Pretrained glove vectors in python

I have downloaded pretrained glove vector file from the internet. It is a .txt file. I am unable to load and access it. It is easy to load and access a word vector binary file using gensim but I don't know how to do it when it is a text file format.

python-2.7 vector nlp

asked Jun 13 '16 at 15:01

Same

votes

3 answers

Scikit Learn TfidfVectorizer : How to get top n terms with highest tf-idf score

I am working on keyword extraction problem. Consider the very general case from sklearn.feature_extraction.text import TfidfVectorizer tfidf = TfidfVectorizer(tokenizer=tokenize, stop_words='english') t = """Two Travellers, walking in the noonday…

python scikit-learn nlp nltk tf-idf

asked Dec 11 '15 at 20:39

AbtPst

7,778
17
91
172

votes

6 answers

NLTK Named Entity Recognition with Custom Data

I'm trying to extract named entities from my text using NLTK. I find that NLTK NER is not very accurate for my purpose and I want to add some more tags of my own as well. I've been trying to find a way to train my own NER, but I don't seem to be…

python nlp nltk named-entity-recognition

asked Jul 04 '12 at 18:24

user1502248

votes

7 answers

Unsupervised Sentiment Analysis

I've been reading a lot of articles that explain the need for an initial set of texts that are classified as either 'positive' or 'negative' before a sentiment analysis system will really work. My question is: Has anyone attempted just doing a…

machine-learning nlp sentiment-analysis

asked Oct 13 '10 at 04:25

Trindaz

17,029
21
82
111

votes

2 answers

Definition of downstream tasks in NLP

What does downstream tasks terminology mean in NLP? I saw this terminology used in several articles but I can't understand the idea behind it.

nlp

asked Nov 11 '18 at 12:38

KF2

9,887
8
44
77

votes

4 answers

Using NLTK and WordNet; how do I convert simple tense verb into its present, past or past participle form?

Using NLTK and WordNet, how do I convert simple tense verb into its present, past or past participle form? For example: I want to write a function which would give me verb in expected form as follows. v = 'go' present = present_tense(v) print…

python nlp nltk wordnet

asked Sep 20 '10 at 15:36

Software Enthusiastic

25,147
16
58
68

votes

7 answers

How to detect language of user entered text?

I am dealing with an application that is accepting user input in different languages (currently 3 languages fixed). The requirement is that users can enter text and dont bother to select the language via a provided checkbox in the UI. Is there an…

java nlp language-detection

asked Jul 12 '10 at 10:07

ManBugra

1,289
2
14
20

votes

4 answers

How to use Gensim doc2vec with pre-trained word vectors?

I recently came across the doc2vec addition to Gensim. How can I use pre-trained word vectors (e.g. found in word2vec original website) with doc2vec? Or is doc2vec getting the word vectors from the same sentences it uses for paragraph-vector…

python nlp gensim word2vec doc2vec

asked Dec 14 '14 at 15:13

Stergios

3,126
6
33
55

votes

4 answers

TFIDF for Large Dataset

I have a corpus which has around 8 million news articles, I need to get the TFIDF representation of them as a sparse matrix. I have been able to do that using scikit-learn for relatively lower number of samples, but I believe it can't be used for…

python lucene nlp scikit-learn tf-idf

asked Aug 05 '14 at 18:09

apurva.nandan

1,061
1
11
19

votes

5 answers

Algorithms to detect phrases and keywords from text

I have around 100 megabytes of text, without any markup, divided to approximately 10,000 entries. I would like to automatically generate a 'tag' list. The problem is that there are word groups (i.e. phrases) that only make sense when they are…

algorithm nlp text-processing

asked Oct 29 '09 at 13:11

Kimvais

38,306
16
108
142

votes

5 answers

How to remove the error "SystemError: initialization of _internal failed without raising an exception"

I am trying to import Top2Vec package for nlp topic modelling. But even after upgrading pip, numpy this error is coming. I tried pip install --upgrade pip pip install --upgrade numpy I was expecting to run from top2vec import Top2Vec model =…

python import nlp google-colaboratory

asked Dec 29 '22 at 06:37

Sayonita Ghosh Roy

votes

4 answers

Entity Extraction/Recognition with free tools while feeding Lucene Index

I'm currently investigating the options to extract person names, locations, tech words and categories from text (a lot articles from the web) which will then feeded into a Lucene/ElasticSearch index. The additional information is then added as…

lucene nlp semantic-web mahout opennlp

asked Sep 17 '11 at 13:42

Karussell

17,085
16
97
197

votes

9 answers

CUDA error: CUBLAS_STATUS_ALLOC_FAILED when calling cublasCreate(handle)

I got the following error when I ran my PyTorch deep learning model in Google Colab /usr/local/lib/python3.6/dist-packages/torch/nn/functional.py in linear(input, weight, bias) 1370 ret = torch.addmm(bias, input, weight.t()) 1371 …

python pytorch nlp cuda bert-language-model

asked Apr 28 '20 at 05:39

Mr. NLP

votes

1 answer

Doc2Vec Get most similar documents

I am trying to build a document retrieval model that returns most documents ordered by their relevancy with respect to a query or a search string. For this I trained a doc2vec model using the Doc2Vec model in gensim. My dataset is in the form of a…

python nlp gensim doc2vec

asked Mar 14 '17 at 08:43

Clock Slave

7,627
15
68
109

Prev 1 2 3

…

99 100 Next