Questions tagged [nlp]

Natural language processing (NLP) is a subfield of artificial intelligence that involves transforming or extracting useful information from natural language data. Methods include machine-learning and rule-based approaches.

Natural language processing (NLP) is a subfield of artificial intelligence that involves transforming or extracting useful information from natural language data. Methods include machine-learning and rule-based approaches. It is often regarded as the engineering arm of Computational Linguistics.

NOTE: If you want to use this tag for a question not directly concerning implementation, then consider posting on Data Science, or Artificial Intelligence instead; otherwise you're probably off-topic. Please choose one site only and do not cross-post to more than one - see Is cross-posting a question on multiple Stack Exchange sites permitted if the question is on-topic for each site? (tl;dr: no).

NLP tasks

Beginner books on Natural Language Processing

Popular software packages

20185 questions
5
votes
1 answer

using Dependency Parser in Stanford coreNLP

I am using the Stanford coreNLP ( http://nlp.stanford.edu/software/corenlp.shtml ) in order to parse sentences and extract dependencies between the words. I have managed to create the dependencies graph like in the example in the supplied link, but…
Eddie Dovzhik
  • 297
  • 1
  • 6
  • 14
5
votes
3 answers

language detection

I am using tesseract for OCR, mainly on invoices. However, tesseract requires to specify the language before it starts processing a file. I thought I am going to perform ocr based on a predefined default language. Then I'd like use the resulting…
Pedro
  • 4,100
  • 10
  • 58
  • 96
5
votes
2 answers

Preserving only domain-specific keywords?

I am trying to determine the most popular keywords for certain class of documents in my collection. Assuming that the domain is "computer science" (which of course, includes networking, computer architecture, etc.) what is the best way to preserve…
Legend
  • 113,822
  • 119
  • 272
  • 400
5
votes
2 answers

weka - how to print incorrectly classified instances

my weka output shows: Correctly Classified Instances 32083 94.0244 % Incorrectly Classified Instances 2039 5.9756 % I want to be able to print out what the incorrect instances were so i can make adjustments…
britt
  • 83
  • 3
  • 5
5
votes
1 answer

Brute-Force language detection

I need an algorithm (any programming language) to test the vitality with an hill climbing algorithm for breaking a cipher for a crypto challenge. The algorithm should test how likely it is that an random-decryption (has no spaces) is an English text…
Daniel Marschall
  • 3,739
  • 2
  • 28
  • 67
5
votes
1 answer

How to create a bag of words using Weka?

I have a corpus of documents and I want to represent each document as a vector. Basically, the vector would have 1 for words that are present inside a document and for other words (which are present in other documents in the corpus and not in this…
London guy
  • 27,522
  • 44
  • 121
  • 179
5
votes
3 answers

How to improve the results of this neural network of finetuned BERT model?

I'm working on a NLP classification problem where I'm trying to classify training courses into 99 categories. I managed to make a few models including the Bayesian classifier but it had an accuracy of 55% (very bad). Given those results, I tried to…
Wajih101
  • 11
  • 7
5
votes
1 answer

"The model 'MPTForCausalLM' is not supported for text-generation"- The following warning is coming when trying to use MPT-7B instruct

I am using a VM of GCP(e2-highmem-4 (Efficient Instance, 4 vCPUs, 32 GB RAM)) to load the model and use it. Here is the code I have written- import torch from transformers import pipeline from transformers import AutoTokenizer,…
DD111
  • 51
  • 1
  • 2
5
votes
2 answers

Flan T5 - How to give the correct prompt/question?

Giving the right kind of prompt to Flan T5 Language model in order to get the correct/accurate responses for a chatbot/option matching use case. I am trying to use a Flan T5 model for the following task. Given a chatbot that presents the user with a…
5
votes
2 answers

Finding words from Wordnet separated by a fixed Edit Distance from a given word

I am writing a spell checker using nltk and wordnet, I have a few wrongly spelt words say "belive". What I want to do is find all words from wordnet that are separated by a leveshtein's edit distance of 1 or 2 from this given word. Does nltk…
Nihar Sarangi
  • 4,845
  • 8
  • 27
  • 32
5
votes
1 answer

Cast topic modeling outcome to dataframe

I have used BertTopic with KeyBERT to extract some topics from some docs from bertopic import BERTopic topic_model = BERTopic(nr_topics="auto", verbose=True, n_gram_range=(1, 4), calculate_probabilities=True,…
xavi
  • 80
  • 1
  • 12
5
votes
1 answer

Which HuggingFace summarization models support more than 1024 tokens? Which model is more suitable for programming related articles?

If this is not the best place to ask this question, please lead me to the most accurate one. I am planning to use one of the Huggingface summarization models (https://huggingface.co/models?pipeline_tag=summarization) to summarize my lecture video…
5
votes
1 answer

A checklist for Spacy optimization?

I have been trying to understand how to systematically make Spacy run as fast as possible for a long time and I would like this post to become a wiki-style public post if possible. Here is what I currently know, with subsidiary questions on each…
hmltn
  • 224
  • 3
  • 14
5
votes
2 answers

which similarity function of nltk.corpus.wordnet is Appropriate for find similarity of two words?

which similarity function in nltk.corpus.wordnet is Appropriate for find similarity of two words? path_similarity()? lch_similarity()? wup_similarity()? res_similarity()? jcn_similarity()? lin_similarity()? I want use a…
Masoud Abasian
  • 10,549
  • 6
  • 23
  • 22
5
votes
2 answers

How to Join Arabic letters to form words

I have to read arabic letters from xml file and display them as a word input :س ع ا د ة output :سعادة look like that .. I dont know how do that in any language , what algorithm to read, I need some start point to acomplish this task I am also not…
Gainster
  • 5,481
  • 19
  • 61
  • 90
1 2 3
99
100