Questions tagged [nlp]

Natural language processing (NLP) is a subfield of artificial intelligence that involves transforming or extracting useful information from natural language data. Methods include machine-learning and rule-based approaches.

NOTE: If you want to use this tag for a question not directly concerning implementation, then consider posting on Data Science, or Artificial Intelligence instead; otherwise you're probably off-topic. Please choose one site only and do not cross-post to more than one - see Is cross-posting a question on multiple Stack Exchange sites permitted if the question is on-topic for each site? (tl;dr: no).

NLP tasks

Text pre-processing
Coreference resolution
Dependency parsing parse-tree
Document summarization summarization
Named entity recognition (NER) named-entity-recognition
Information extraction (IE) information-retrieval information-extraction
Language modeling
Part-of-speech (POS) tagging part-of-speech
Morphological analysis and wordform generation
Phrase-structure (constituency) parsing parse-tree
Machine translation (MT) machine-translation
Question answering (QA) nlp-question-answering
Sentiment analysis sentiment-analysis
Semantic parsing semantic-analysis
Text categorization text-classification document-classification
Textual entailment detection
Topic modeling topic-modeling
Word Sense Disambiguation (WSD) word-sense-disambiguation

Beginner books on Natural Language Processing

Popular software packages

General purpose toolkits
- Natural Language Toolkit (NLTK) (Python) nltk
- OpenNLP (Java) opennlp
- Sharp NLP (.NET) sharpnlp
- ClearNLP (Java) clearnlp
- Mate (Java)
- Stanford CoreNLP (Java) stanford-nlp
- Treat (Ruby)
- Mallet (Java) mallet
- spaCy (Python) spacy
- Pattern (Python) python-pattern
Phrase structure parsers
- Stanford Parser (Java) stanford-nlp
- Berkeley Parser (Java)
- BLLIP (Charniak-Johnson) Parser (C++, Python) charniak-parser
Dependency parsers
- Stanford Dependencies (packaged with Stanford parser) (Java) stanford-nlp
- MaltParser (Java)
- MSTParser (Java)
- UDPipe
Proof reading software
- LanguageTool (Java) languagetool

20185 questions

votes

1 answer

using Dependency Parser in Stanford coreNLP

I am using the Stanford coreNLP ( http://nlp.stanford.edu/software/corenlp.shtml ) in order to parse sentences and extract dependencies between the words. I have managed to create the dependencies graph like in the example in the supplied link, but…

nlp stanford-nlp

asked Nov 17 '11 at 15:33

Eddie Dovzhik

votes

3 answers

language detection

I am using tesseract for OCR, mainly on invoices. However, tesseract requires to specify the language before it starts processing a file. I thought I am going to perform ocr based on a predefined default language. Then I'd like use the resulting…

c++ nlp ocr language-detection

asked Nov 16 '11 at 19:15

Pedro

4,100
10
58
96

votes

2 answers

Preserving only domain-specific keywords?

I am trying to determine the most popular keywords for certain class of documents in my collection. Assuming that the domain is "computer science" (which of course, includes networking, computer architecture, etc.) what is the best way to preserve…

python nlp machine-learning nltk

asked Nov 02 '11 at 20:27

Legend

113,822
119
272
400

votes

2 answers

weka - how to print incorrectly classified instances

my weka output shows: Correctly Classified Instances 32083 94.0244 % Incorrectly Classified Instances 2039 5.9756 % I want to be able to print out what the incorrect instances were so i can make adjustments…

java nlp classification weka

asked Oct 18 '11 at 21:02

britt

votes

1 answer

Brute-Force language detection

I need an algorithm (any programming language) to test the vitality with an hill climbing algorithm for breaking a cipher for a crypto challenge. The algorithm should test how likely it is that an random-decryption (has no spaces) is an English text…

java algorithm cryptography nlp

asked Oct 17 '11 at 23:37

Daniel Marschall

3,739
2
28
67

votes

1 answer

How to create a bag of words using Weka?

I have a corpus of documents and I want to represent each document as a vector. Basically, the vector would have 1 for words that are present inside a document and for other words (which are present in other documents in the corpus and not in this…

nlp weka

asked Oct 10 '11 at 07:26

London guy

27,522
44
121
179

votes

3 answers

How to improve the results of this neural network of finetuned BERT model?

I'm working on a NLP classification problem where I'm trying to classify training courses into 99 categories. I managed to make a few models including the Bayesian classifier but it had an accuracy of 55% (very bad). Given those results, I tried to…

python tensorflow keras deep-learning nlp

asked Jul 06 '23 at 09:09

Wajih101

votes

1 answer

"The model 'MPTForCausalLM' is not supported for text-generation"- The following warning is coming when trying to use MPT-7B instruct

I am using a VM of GCP(e2-highmem-4 (Efficient Instance, 4 vCPUs, 32 GB RAM)) to load the model and use it. Here is the code I have written- import torch from transformers import pipeline from transformers import AutoTokenizer,…

python-3.x deep-learning nlp

asked May 11 '23 at 11:39

DD111

votes

2 answers

Flan T5 - How to give the correct prompt/question?

Giving the right kind of prompt to Flan T5 Language model in order to get the correct/accurate responses for a chatbot/option matching use case. I am trying to use a Flan T5 model for the following task. Given a chatbot that presents the user with a…

nlp huggingface-transformers

asked Jan 22 '23 at 18:55

Rahul Seeetharaman

votes

2 answers

Finding words from Wordnet separated by a fixed Edit Distance from a given word

I am writing a spell checker using nltk and wordnet, I have a few wrongly spelt words say "belive". What I want to do is find all words from wordnet that are separated by a leveshtein's edit distance of 1 or 2 from this given word. Does nltk…

python nlp nltk wordnet

asked Sep 20 '11 at 18:36

Nihar Sarangi

4,845
8
27
32

votes

1 answer

Cast topic modeling outcome to dataframe

I have used BertTopic with KeyBERT to extract some topics from some docs from bertopic import BERTopic topic_model = BERTopic(nr_topics="auto", verbose=True, n_gram_range=(1, 4), calculate_probabilities=True,…

python-3.x pandas nlp bert-language-model topic-modeling

asked Dec 13 '22 at 12:58

xavi

votes

1 answer

Which HuggingFace summarization models support more than 1024 tokens? Which model is more suitable for programming related articles?

If this is not the best place to ask this question, please lead me to the most accurate one. I am planning to use one of the Huggingface summarization models (https://huggingface.co/models?pipeline_tag=summarization) to summarize my lecture video…

nlp huggingface-transformers summarization huggingface mlmodel

asked Oct 27 '22 at 21:45

Furkan Gözükara

22,964
77
205
342

votes

1 answer

A checklist for Spacy optimization?

I have been trying to understand how to systematically make Spacy run as fast as possible for a long time and I would like this post to become a wiki-style public post if possible. Here is what I currently know, with subsidiary questions on each…

optimization nlp spacy micro-optimization

asked Oct 24 '22 at 13:23

hmltn

votes

2 answers

which similarity function of nltk.corpus.wordnet is Appropriate for find similarity of two words?

which similarity function in nltk.corpus.wordnet is Appropriate for find similarity of two words? path_similarity()? lch_similarity()? wup_similarity()? res_similarity()? jcn_similarity()? lin_similarity()? I want use a…

python nlp nltk wordnet corpus

asked Sep 13 '11 at 10:42

Masoud Abasian

10,549
6
23
22

votes

2 answers

How to Join Arabic letters to form words

I have to read arabic letters from xml file and display them as a word input :س ع ا د ة output :سعادة look like that .. I dont know how do that in any language , what algorithm to read, I need some start point to acomplish this task I am also not…

algorithm nlp arabic

asked Sep 11 '11 at 08:40

Gainster

5,481
19
61
90

Prev 1 2 3

…

100