Questions tagged [nlp]

Natural language processing (NLP) is a subfield of artificial intelligence that involves transforming or extracting useful information from natural language data. Methods include machine-learning and rule-based approaches.

Natural language processing (NLP) is a subfield of artificial intelligence that involves transforming or extracting useful information from natural language data. Methods include machine-learning and rule-based approaches. It is often regarded as the engineering arm of Computational Linguistics.

NOTE: If you want to use this tag for a question not directly concerning implementation, then consider posting on Data Science, or Artificial Intelligence instead; otherwise you're probably off-topic. Please choose one site only and do not cross-post to more than one - see Is cross-posting a question on multiple Stack Exchange sites permitted if the question is on-topic for each site? (tl;dr: no).

NLP tasks

Beginner books on Natural Language Processing

Popular software packages

20185 questions
6
votes
1 answer

How to detect if two news articles have the same topic? (Python semantic similarity)

I'm trying to scrape headlines and body text from articles on a few specific sites, similar to what Google does with Google News. The problem is that across different sites, they may have articles on the same subject worded slightly differently. Can…
resopollution
  • 19,600
  • 10
  • 40
  • 49
6
votes
1 answer

Diminutive words stemming / lemmatization

Currently I use 'lucene' and 'elasticsearch', and have next problem. I need get stemmed form or lemma for diminutive word. For instance : doggy -> dog kitty -> cat etc. But I get next results : doggy -> doggi kitty -> kitti Is there any way…
Ivan Kurchenko
  • 4,043
  • 1
  • 11
  • 28
6
votes
7 answers

Constructing human readable sentences based on a survey

The following is a survey given to course attendees to assess an instructor at the end of the course. Communication Skills 1. The instructor communicated course material clearly and accurately. Yes No 2. The instructor explained course objectives…
Joe
  • 14,513
  • 28
  • 82
  • 144
6
votes
1 answer

Python NLP: TypeError: not all arguments converted during string formatting

I tried the code on "Natural language processing with python", but a type error occurred. import nltk from nltk.corpus import brown suffix_fdist = nltk.FreqDist() for word in brown.words(): word = word.lower() suffix_fdist.inc(word[-1:]) …
allenwang
  • 727
  • 2
  • 8
  • 25
6
votes
2 answers

Updating the feature names into scikit TFIdfVectorizer

I am trying out this code from sklearn.feature_extraction.text import TfidfVectorizer import numpy as np train_data = ["football is the sport","gravity is the movie", "education is imporatant"] vectorizer = TfidfVectorizer(sublinear_tf=True,…
Gunjan
  • 2,775
  • 27
  • 30
6
votes
4 answers

Defining the context of a word - Python

I think this is an interesting question, at least for me. I have a list of words, let's say: photo, free, search, image, css3, css, tutorials, webdesign, tutorial, google, china, censorship, politics, internet and I have a list of…
RadiantHex
  • 24,907
  • 47
  • 148
  • 244
6
votes
1 answer

Can I control the way the CountVectorizer vectorizes the corpus in scikit learn?

I am working with a CountVectorizer from scikit learn, and I'm possibly attempting to do some things that the object was not made for...but I'm not sure. In terms of getting counts for occurrence: vocabulary = ['hi', 'bye', 'run away!'] corpus =…
tumultous_rooster
  • 12,150
  • 32
  • 92
  • 149
6
votes
5 answers

How to install and invoke Stanford NERTagger?

I am trying to use NLTK interface for Stanford NER in the python enviornment, nltk.tag.stanford.NERTagger. from nltk.tag.stanford import NERTagger st = NERTagger('/usr/share/stanford-ner/classifiers/all.3class.distsim.crf.ser.gz', …
Hans
  • 1,269
  • 3
  • 19
  • 38
6
votes
3 answers

How to create the negative of a sentence in nltk

I am new to NLTK. I would like to create the negative of a sentence (which will usually be in the present tense). For example, is there a function to allow me to convert: 'I run' to 'I do not run' or 'She runs' to 'She does not run'. I suppose I…
Sebastian Zeki
  • 6,690
  • 11
  • 60
  • 125
6
votes
1 answer

Understanding LDA Transformed Corpus in Gensim

I tried to examine the contents of the BOW corpus vs. the LDA[BOW Corpus] (transformed by LDA model trained on that corpus with, say, 35 topics) I found the following output: DOC 1 : [(1522, 1), (2028, 1), (2082, 1), (6202, 1)] LDA 1 : [(29,…
Ravi Karan
  • 445
  • 1
  • 7
  • 13
6
votes
1 answer

Text mining, fact extraction, semantic analysis using .Net

I'm looking for any free tools/components/libraries that allow me to take anvantage of text mining, fact extraction and semantic analysis in my .NET application. The GATE project is what I need but it is written in Java. Is there something like…
6
votes
1 answer

Stanford CoreNLP sentiment

I'm trying to implement the coreNLP sentiment analyzer in eclipse. Getting the error: Unable to resolve "edu/stanford/nlp/models/lexparser/englishPCFG.ser.gz" As either class path, filename or URL. I installed all of the NLP files using maven so I…
English Grad
  • 1,365
  • 5
  • 21
  • 40
6
votes
2 answers

Exactly replicating R text preprocessing in python

I would like to preprocess a corpus of documents using Python in the same way that I can in R. For example, given an initial corpus, corpus, I would like to end up with a preprocessed corpus that corresponds to the one produced using the following R…
orome
  • 45,163
  • 57
  • 202
  • 418
6
votes
2 answers

Extracting information from unstructured text

I have a collection of "articles", each 1 to 10 sentences long, written in a noisy, informal english (i.e. social media style). I need to extract some information from each article, where available, like date and time. I also need to understand what…
Trasplazio Garzuglio
  • 3,535
  • 2
  • 25
  • 25
6
votes
1 answer

How to extract character ngram from sentences? - python

The following word2ngrams function extracts character 3grams from a word: >>> x = 'foobar' >>> n = 3 >>> [x[i:i+n] for i in range(len(x)-n+1)] ['foo', 'oob', 'oba', 'bar'] This post shows the character ngrams extraction for a single word, Quick…
alvas
  • 115,346
  • 109
  • 446
  • 738