Questions tagged [nlp]

Natural language processing (NLP) is a subfield of artificial intelligence that involves transforming or extracting useful information from natural language data. Methods include machine-learning and rule-based approaches.

Natural language processing (NLP) is a subfield of artificial intelligence that involves transforming or extracting useful information from natural language data. Methods include machine-learning and rule-based approaches. It is often regarded as the engineering arm of Computational Linguistics.

NOTE: If you want to use this tag for a question not directly concerning implementation, then consider posting on Data Science, or Artificial Intelligence instead; otherwise you're probably off-topic. Please choose one site only and do not cross-post to more than one - see Is cross-posting a question on multiple Stack Exchange sites permitted if the question is on-topic for each site? (tl;dr: no).

NLP tasks

Beginner books on Natural Language Processing

Popular software packages

20185 questions
6
votes
3 answers

1 million sentences to save in DB - removing non-relevant English words

I am trying to train a Naive Bayes classifier with positive/negative words extracting from a sentiment. example: I love this movie :)) I hate when it rains :( The idea is I extract positive or negative sentences based on the emoctions used,…
daydreamer
  • 87,243
  • 191
  • 450
  • 722
6
votes
2 answers

Natural Language Processing - Truecaser classifier

Please suggest a good machine learning classifier for truecasing of dataset. Also, Is it possible to specify out own rules/features for truecasing in such a classifier? Thanks for all your suggestions. Thanks
Anu
  • 525
  • 1
  • 6
  • 18
6
votes
2 answers

AttributeError: type object 'Word2Vec' has no attribute 'load_word2vec_format'

I am trying to implement word2vec model and getting Attribute error AttributeError: type object 'Word2Vec' has no attribute 'load_word2vec_format' Below is the code : wv = Word2Vec.load_word2vec_format("GoogleNews-vectors-negative300.bin.gz",…
Rishabh Rusia
  • 173
  • 2
  • 4
  • 19
6
votes
1 answer

NLP bag-of-words/TF-IDF for clustering (and classifying) short sentences

I want to cluster Javascript objects by one of their string key values (description). I already tried multiple solutions and would like some guidance on how to approach the problem. What I want: Let's say I have a database of objects. There can be a…
6
votes
3 answers

How to include words as numerical feature in classification

Whats the best method to use the words itself as the features in any machine learning algorithm ? The problem I have to extract word related feature from a particular paragraph. Should I use the index in the dictionary as the numerical feature ? If…
AlgoMan
  • 2,785
  • 6
  • 34
  • 40
6
votes
1 answer

WordNet Python words similarity

I'm trying to find a reliable way to measure the semantic similarity of 2 terms. The first metric could be the path distance on a hyponym/hypernym graph (eventually a linear combination of 2-3 metrics could be better..). from nltk.corpus import…
alfredopacino
  • 2,979
  • 9
  • 42
  • 68
6
votes
3 answers

What string distance algorithm is best for measuring typing accuracy?

I'm trying to write a function that detects how accurate the user typed a particular phrase/sentence/word/words. My objective is to build an app to train the user's typing accuracy of certain phrases. My initial instinct is to use the basic…
adrianmcli
  • 1,956
  • 3
  • 21
  • 49
6
votes
3 answers

Is there a library or web service that provides pronunciations for text?

Is there a library or web service that can tell you the pronunciation of a string? I'm thinking of character-based languages, where the pronunciation of the word is not apparent from how it's written.
Mike Sickler
  • 33,662
  • 21
  • 64
  • 90
6
votes
2 answers

Search for job titles in an article using Spacy or NLTK

I'm new to NLP and recently been playing with NTLK and Spacy. However, I could not find a way to search for job titles (ex: product manager, chief marketing officer, etc) in an article. Example, I have 1000 articles and I want to get all the…
user643132
  • 101
  • 1
  • 5
6
votes
1 answer

What is Two-Level Morphology?

In Natural Language Processing what are the two levels of this two-level Morphology framework ?
Iresha Rubasinghe
  • 913
  • 1
  • 10
  • 27
6
votes
2 answers

efficient way to calculate distance between combinations of pandas frame columns

Task I have a pandas dataframe where: the columns are document names the rows are words in those documents numbers inside the frame cells are a measure of word relevance (word count if you want to keep it simple) I need to calculate a new matrix…
6
votes
8 answers

Counting the number of occurrences of words in a textfile

How could I go about keeping track of the number of times a word appears in a textfile? I would like to do this for every word. For example, if the input is something like: "the man said hi to the boy." Each of "man said hi to boy" would have an…
vinc456
  • 2,862
  • 5
  • 23
  • 30
6
votes
2 answers

Syntactic similarity/distance between 2 sentences/string/text using nltk

I have 2 texts as below Text1 : John likes apple Text2 : Mike hates orange If you check above 2 texts, both of them are similar syntactically but semantically have a different meaning. I want to find 1) Syntactic distance between 2 texts 2) Semantic…
Ganesh Deshvini
  • 429
  • 3
  • 17
6
votes
3 answers

finding noun and verb in stanford parser

I need to find whether a word is verb or noun or it is both For example, the word is "search" it can be both noun and a verb but stanford parser gives NN tag to it.. is there any way that stanford parser will give that "search" is both noun and…
karthi
  • 2,762
  • 4
  • 30
  • 28
6
votes
1 answer

How to use the link grammar parser as a grammar checker

Abiword uses the link grammar parser as a simple grammar checker. I'd like to duplicate this feature with Python. Poorly documented Python bindings exist, but I don't know how to use them to mimic the grammar checker in Abiword. (I'm not…
Nemo XXX
  • 644
  • 2
  • 14
  • 35