Questions tagged [nlp]

Natural language processing (NLP) is a subfield of artificial intelligence that involves transforming or extracting useful information from natural language data. Methods include machine-learning and rule-based approaches.

Natural language processing (NLP) is a subfield of artificial intelligence that involves transforming or extracting useful information from natural language data. Methods include machine-learning and rule-based approaches. It is often regarded as the engineering arm of Computational Linguistics.

NOTE: If you want to use this tag for a question not directly concerning implementation, then consider posting on Data Science, or Artificial Intelligence instead; otherwise you're probably off-topic. Please choose one site only and do not cross-post to more than one - see Is cross-posting a question on multiple Stack Exchange sites permitted if the question is on-topic for each site? (tl;dr: no).

NLP tasks

Beginner books on Natural Language Processing

Popular software packages

20185 questions
66
votes
8 answers

Semantic search with NLP and elasticsearch

I am experimenting with elasticsearch as a search server and my task is to build a "semantic" search functionality. From a short text phrase like "I have a burst pipe" the system should infer that the user is searching for a plumber and return all…
user1089363
  • 673
  • 1
  • 6
  • 8
66
votes
6 answers

Keras Text Preprocessing - Saving Tokenizer object to file for scoring

I've trained a sentiment classifier model using Keras library by following the below steps(broadly). Convert Text corpus into sequences using Tokenizer object/class Build a model using the model.fit() method Evaluate this model Now for scoring…
66
votes
8 answers

Add/remove custom stop words with spacy

What is the best way to add/remove stop words with spacy? I am using token.is_stop function and would like to make some custom changes to the set. I was looking at the documentation but could not find anything regarding of stop words. Thanks!
E.K.
  • 4,179
  • 8
  • 30
  • 50
66
votes
2 answers

How do I tokenize a string sentence in NLTK?

I am using nltk, so I want to create my own custom texts just like the default ones on nltk.books. However, I've just got up to the method like my_text = ['This', 'is', 'my', 'text'] I'd like to discover any way to input my "text" as: my_text =…
diegoaguilar
  • 8,179
  • 14
  • 80
  • 129
66
votes
6 answers

How to check whether a sentence is correct (simple grammar check in Python)?

How to check whether a sentence is valid in Python? Examples: I love Stackoverflow - Correct I Stackoverflow love - Incorrect
ChamingaD
  • 2,908
  • 8
  • 35
  • 58
65
votes
11 answers

Is there an algorithm that tells the semantic similarity of two phrases

input: phrase 1, phrase 2 output: semantic similarity value (between 0 and 1), or the probability these two phrases are talking about the same thing
btw0
  • 3,516
  • 5
  • 34
  • 36
63
votes
14 answers

SpaCy OSError: Can't find model 'en'

even though I downloaded the model it cannot load it [jalal@goku entity-sentiment-analysis]$ which python /scratch/sjn/anaconda/bin/python [jalal@goku entity-sentiment-analysis]$ sudo python -m spacy download en [sudo] password for jalal:…
Mona Jalal
  • 34,860
  • 64
  • 239
  • 408
62
votes
3 answers

What is a projection layer in the context of neural networks?

I am currently trying to understand the architecture behind the word2vec neural net learning algorithm, for representing words as vectors based on their context. After reading Tomas Mikolov paper I came across what he defines as a projection layer.…
Roger
  • 1,053
  • 1
  • 8
  • 14
61
votes
33 answers

What programming language is most like natural language?

I got the idea for this question from numerous situations where I don't understand what the person is talking about and when others don't understand me. So, a "smart" solution would be to speak a computer language. :) I am interested how far a…
kliketa
  • 1,276
  • 3
  • 17
  • 23
61
votes
8 answers

Expanding English language contractions in Python

The English language has a couple of contractions. For instance: you've -> you have he's -> he is These can sometimes cause headache when you are doing natural language processing. Is there a Python library, which can expand these contractions?
Maarten
  • 4,549
  • 4
  • 31
  • 36
59
votes
7 answers

Best way to identify and extract dates from text Python?

As part of a larger personal project I'm working on, I'm attempting to separate out inline dates from a variety of text sources. For example, I have a large list of strings (that usually take the form of English sentences or statements) that take a…
redct
  • 884
  • 1
  • 6
  • 8
58
votes
16 answers

How can I split multiple joined words?

I have an array of 1000 or so entries, with examples below: wickedweather liquidweather driveourtrucks gocompact slimprojector I would like to be able to split these into their respective words, as: wicked weather liquid weather drive our trucks go…
Taptronic
  • 5,129
  • 9
  • 44
  • 59
58
votes
6 answers

What do the BILOU tags mean in Named Entity Recognition?

Title pretty much sums up the question. I've noticed that in some papers people have referred to a BILOU encoding scheme for NER as opposed to the typical BIO tagging scheme (Such as this paper by Ratinov and Roth in 2009…
GrantD71
  • 1,787
  • 3
  • 19
  • 27
57
votes
6 answers

Training data for sentiment analysis

Where can I get a corpus of documents that have already been classified as positive/negative for sentiment in the corporate domain? I want a large corpus of documents that provide reviews for companies, like reviews of companies provided by analysts…
London guy
  • 27,522
  • 44
  • 121
  • 179
57
votes
2 answers

Hamming Distance vs. Levenshtein Distance

For the problem I'm working on, finding distances between two sequences to determine their similarity, sequence order is very important. However, the sequences that I have are not all the same length, so I pad any deficient strings with empty points…
don
  • 820
  • 1
  • 6
  • 10