Questions tagged [nlp]

Natural language processing (NLP) is a subfield of artificial intelligence that involves transforming or extracting useful information from natural language data. Methods include machine-learning and rule-based approaches.

Natural language processing (NLP) is a subfield of artificial intelligence that involves transforming or extracting useful information from natural language data. Methods include machine-learning and rule-based approaches. It is often regarded as the engineering arm of Computational Linguistics.

NOTE: If you want to use this tag for a question not directly concerning implementation, then consider posting on Data Science, or Artificial Intelligence instead; otherwise you're probably off-topic. Please choose one site only and do not cross-post to more than one - see Is cross-posting a question on multiple Stack Exchange sites permitted if the question is on-topic for each site? (tl;dr: no).

NLP tasks

Beginner books on Natural Language Processing

Popular software packages

20185 questions
81
votes
4 answers

Stemmers vs Lemmatizers

Natural Language Processing (NLP), especially for English, has evolved into the stage where stemming would become an archaic technology if "perfect" lemmatizers exist. It's because stemmers change the surface form of a word/token into some…
alvas
  • 115,346
  • 109
  • 446
  • 738
80
votes
9 answers

How to use Bert for long text classification?

We know that BERT has a max length limit of tokens = 512, So if an article has a length of much bigger than 512, such as 10000 tokens in text How can BERT be used?
user1337896
  • 1,081
  • 1
  • 10
  • 15
79
votes
6 answers

Stopword removal with NLTK

I am trying to process a user entered text by removing stopwords using nltk toolkit, but with stopword-removal the words like 'and', 'or', 'not' gets removed. I want these words to be present after stopword removal process as they are operators…
Grahesh Parkar
  • 1,017
  • 1
  • 13
  • 16
79
votes
1 answer

What are the major differences and benefits of Porter and Lancaster Stemming algorithms?

I'm Working on document classification tasks in java. Both algorithms came highly recommended, what are the benefits and disadvantages of each and which is more commonly used in the literature for Natural Language Processing tasks?
Adam Hess
  • 1,396
  • 1
  • 13
  • 28
78
votes
3 answers

Practical examples of NLTK use

I'm playing around with the Natural Language Toolkit (NLTK). Its documentation (Book and HOWTO) are quite bulky and the examples are sometimes slightly advanced. Are there any good but basic examples of uses/applications of NLTK? I'm thinking of…
Mat
  • 82,161
  • 34
  • 89
  • 109
76
votes
1 answer

Python NLTK: SyntaxError: Non-ASCII character '\xc3' in file (Sentiment Analysis -NLP)

I am playing around with NLTK to do an assignment on sentiment analysis. I am using Python 2.7. NLTK 3.0 and NumPy1.9.1 version. This is the code : __author__ = 'karan' import nltk import re import sys def main(): print("Start"); #…
rkbom9
  • 913
  • 3
  • 9
  • 17
75
votes
8 answers

English grammar for parsing in NLTK

Is there a ready-to-use English grammar that I can just load it and use in NLTK? I've searched around examples of parsing with NLTK, but it seems like that I have to manually specify grammar before parsing a sentence. Thanks a lot!
roboren
  • 891
  • 1
  • 7
  • 5
75
votes
12 answers

How to return history of validation loss in Keras

Using Anaconda Python 2.7 Windows 10. I am training a language model using the Keras exmaple: print('Build model...') model = Sequential() model.add(GRU(512, return_sequences=True, input_shape=(maxlen,…
ishido
  • 4,065
  • 9
  • 32
  • 42
73
votes
2 answers

Any tutorials for developing chatbots?

As a engineering student, I would like to make a chat bot using python. So, I searched a lot but couldn't really find stuff that would teach me or give me some concrete information to build a intelligent chat bot. I would like to make a chatbot that…
Surya
  • 4,824
  • 6
  • 38
  • 63
73
votes
9 answers

What do spaCy's part-of-speech and dependency tags mean?

spaCy tags up each of the Tokens in a Document with a part of speech (in two different formats, one stored in the pos and pos_ properties of the Token and the other stored in the tag and tag_ properties) and a syntactic dependency to its .head token…
Mark Amery
  • 143,130
  • 81
  • 406
  • 459
72
votes
4 answers

How to extract common / significant phrases from a series of text entries

I have a series of text items- raw HTML from a MySQL database. I want to find the most common phrases in these entries (not the single most common phrase, and ideally, not enforcing word-for-word matching). My example is any review on Yelp.com,…
arronsky
  • 721
  • 1
  • 6
  • 3
69
votes
11 answers

Where can I learn more about the Google search "did you mean" algorithm?

Possible Duplicate: How do you implement a “Did you mean”? I am writing an application where I require functionality similar to Google's "did you mean?" feature used by their search engine: Is there source code available for such a thing or…
vidhi
  • 753
  • 1
  • 6
  • 7
67
votes
14 answers

Algorithm to determine how positive or negative a statement/text is

I need an algorithm to determine if a sentence, paragraph or article is negative or positive in tone... or better yet, how negative or positive. For instance: Jason is the worst SO user I have ever witnessed (-10) Jason is an SO user (0) Jason is…
Jason
  • 16,739
  • 23
  • 87
  • 137
67
votes
2 answers

What is CoNLL data format?

I am using a open source jar (Mate Parser) which outputs in the CoNLL 2009 format after dependency parsing. I want to use the dependency parsing results for Information Extraction, however, I only understand part of the output in the CoNLL data…
66
votes
11 answers

Natural Language Processing in Ruby

I'm looking to do some sentence analysis (mostly for twitter apps) and infer some general characteristics. Are there any good natural language processing libraries for this sort of thing in Ruby? Similar to Is there a good natural language…
Joey Robert
  • 7,336
  • 7
  • 34
  • 31