Questions tagged [nlp]

Natural language processing (NLP) is a subfield of artificial intelligence that involves transforming or extracting useful information from natural language data. Methods include machine-learning and rule-based approaches.

Natural language processing (NLP) is a subfield of artificial intelligence that involves transforming or extracting useful information from natural language data. Methods include machine-learning and rule-based approaches. It is often regarded as the engineering arm of Computational Linguistics.

NOTE: If you want to use this tag for a question not directly concerning implementation, then consider posting on Data Science, or Artificial Intelligence instead; otherwise you're probably off-topic. Please choose one site only and do not cross-post to more than one - see Is cross-posting a question on multiple Stack Exchange sites permitted if the question is on-topic for each site? (tl;dr: no).

NLP tasks

Beginner books on Natural Language Processing

Popular software packages

20185 questions
57
votes
3 answers

Text Summarization Evaluation - BLEU vs ROUGE

With the results of two different summary systems (sys1 and sys2) and the same reference summaries, I evaluated them with both BLEU and ROUGE. The problem is: All ROUGE scores of sys1 was higher than sys2 (ROUGE-1, ROUGE-2, ROUGE-3, ROUGE-4,…
Chelsea_cole
  • 1,055
  • 3
  • 15
  • 21
56
votes
10 answers

Is it possible to guess a user's mood based on the structure of text?

I assume a natural language processor would need to be used to parse the text itself, but what suggestions do you have for an algorithm to detect a user's mood based on text that they have written? I doubt it would be very accurate, but I'm still…
David Brown
  • 35,411
  • 11
  • 83
  • 132
56
votes
2 answers

How is WordPiece tokenization helpful to effectively deal with rare words problem in NLP?

I have seen that NLP models such as BERT utilize WordPiece for tokenization. In WordPiece, we split the tokens like playing to play and ##ing. It is mentioned that it covers a wider spectrum of Out-Of-Vocabulary (OOV) words. Can someone please help…
Harman
  • 1,168
  • 1
  • 9
  • 19
55
votes
1 answer

gensim Doc2Vec vs tensorflow Doc2Vec

I'm trying to compare my implementation of Doc2Vec (via tf) and gensims implementation. It seems atleast visually that the gensim ones are performing better. I ran the following code to train the gensim model and the one below that for tensorflow…
sachinruk
  • 9,571
  • 12
  • 55
  • 86
55
votes
51 answers

Is there a human readable programming language?

I mean, is there a coded language with human style coding? For example: Create an object called MyVar and initialize it to 10; Take MyVar and call MyMethod() with parameters. . . I know it's not so useful, but it can be interesting to create such a…
Enrico Murru
  • 2,313
  • 4
  • 21
  • 24
55
votes
11 answers

how to check if a string looks randomized, or human generated and pronouncable?

For the purpose of identifying [possible] bot-generated usernames. Suppose you have a username like "bilbomoothof" .. it may be nonsense, but it still contains pronouncable sounds and so appears human-generated. I accept that it could have been…
Tim Whitlock
  • 1,111
  • 9
  • 17
54
votes
8 answers

Looking for Java spell checker library

I am looking for an open source Java spell checking library which has dictionaries for at least the following languages: French, German, Spanish, and Czech. Any suggestion?
avernet
  • 30,895
  • 44
  • 126
  • 163
54
votes
9 answers

Sentiment Analysis Dictionaries

I was wondering if anybody knew where I could obtain dictionaries of positive and negative words. I'm looking into sentiment analysis and this is a crucial part of it.
user387049
  • 6,647
  • 8
  • 53
  • 55
54
votes
3 answers

CBOW v.s. skip-gram: why invert context and target words?

In this page, it is said that: [...] skip-gram inverts contexts and targets, and tries to predict each context word from its target word [...] However, looking at the training dataset it produces, the content of the X and Y pair seems to be…
Guillaume Chevalier
  • 9,613
  • 8
  • 51
  • 79
54
votes
5 answers

gensim word2vec: Find number of words in vocabulary

After training a word2vec model using python gensim, how do you find the number of words in the model's vocabulary?
hlin117
  • 20,764
  • 31
  • 72
  • 93
54
votes
7 answers

Improving the extraction of human names with nltk

I am trying to extract human names from text. Does anyone have a method that they would recommend? This is what I tried (code is below): I am using nltk to find everything marked as a person and then generating a list of all the NNP parts of that…
e h
  • 8,435
  • 7
  • 40
  • 58
53
votes
5 answers

How can a sentence or a document be converted to a vector?

We have models for converting words to vectors (for example the word2vec model). Do similar models exist which convert sentences/documents into vectors, using perhaps the vectors learnt for the individual words?
Sahil
  • 1,346
  • 1
  • 12
  • 17
53
votes
5 answers

Feature Selection and Reduction for Text Classification

I am currently working on a project, a simple sentiment analyzer such that there will be 2 and 3 classes in separate cases. I am using a corpus that is pretty rich in the means of unique words (around 200.000). I used bag-of-words method for feature…
clancularius
  • 877
  • 1
  • 9
  • 12
52
votes
5 answers

What Is the Difference Between POS Tagging and Shallow Parsing?

I'm currently taking a Natural Language Processing course at my University and still confused with some basic concept. I get the definition of POS Tagging from the Foundations of Statistical Natural Language Processing book: Tagging is the task of…
bertzzie
  • 3,558
  • 5
  • 30
  • 41
51
votes
12 answers

How to read values from numbers written as words?

As we all know numbers can be written either in numerics, or called by their names. While there are a lot of examples to be found that convert 123 into one hundred twenty three, I could not find good examples of how to convert it the other way…
Evgeny
  • 6,533
  • 5
  • 58
  • 64