Questions tagged [nlp]

Natural language processing (NLP) is a subfield of artificial intelligence that involves transforming or extracting useful information from natural language data. Methods include machine-learning and rule-based approaches.

Natural language processing (NLP) is a subfield of artificial intelligence that involves transforming or extracting useful information from natural language data. Methods include machine-learning and rule-based approaches. It is often regarded as the engineering arm of Computational Linguistics.

NOTE: If you want to use this tag for a question not directly concerning implementation, then consider posting on Data Science, or Artificial Intelligence instead; otherwise you're probably off-topic. Please choose one site only and do not cross-post to more than one - see Is cross-posting a question on multiple Stack Exchange sites permitted if the question is on-topic for each site? (tl;dr: no).

NLP tasks

Beginner books on Natural Language Processing

Popular software packages

20185 questions
44
votes
9 answers

Machine Learning and Natural Language Processing

Assume you know a student who wants to study Machine Learning and Natural Language Processing. What specific computer science subjects should they focus on and which programming languages are specifically designed to solve these types of problems? I…
Stephano
  • 5,716
  • 7
  • 41
  • 57
43
votes
1 answer

How to extract phrases from corpus using gensim

For preprocessing the corpus I was planing to extarct common phrases from the corpus, for this I tried using Phrases model in gensim, I tried below code but it's not giving me desired output. My code from gensim.models import Phrases documents =…
Prashant Puri
  • 2,324
  • 1
  • 15
  • 21
43
votes
4 answers

Use of PunktSentenceTokenizer in NLTK

I am learning Natural Language Processing using NLTK. I came across the code using PunktSentenceTokenizer whose actual use I cannot understand in the given code. The code is given : import nltk from nltk.corpus import state_union from nltk.tokenize…
arqam
  • 3,582
  • 5
  • 34
  • 69
42
votes
9 answers

Restore original text from Keras’s imdb dataset

Restore original text from Keras’s imdb dataset I want to restore imdb’s original text from Keras’s imdb dataset. First, when I load Keras’s imdb dataset, it returned sequence of word index. >>> (X_train, y_train), (X_test, y_test) =…
Hironsan
  • 685
  • 1
  • 5
  • 9
42
votes
10 answers

What are good starting points for someone interested in natural language processing?

Question So I've recently came up with some new possible projects that would have to deal with deriving 'meaning' from text submitted and generated by users. Natural language processing is the field that deals with these kinds of issues, and after…
kitsune
  • 11,516
  • 13
  • 57
  • 78
41
votes
5 answers

NLTK and language detection

How do I detect what language a text is written in using NLTK? The examples I've seen use nltk.detect, but when I've installed it on my mac, I cannot find this package.
niklassaers
  • 8,480
  • 20
  • 99
  • 146
41
votes
2 answers

How to connect Cortana commands to custom scripts?

This may be a little early to ask this, but I'm running Windows 10 Technical Preview Build 10122. I'd like to set up Cortana to have custom commands. Here's how she works: Hey Cortana, Microsoft will process…
Charles Clayton
  • 17,005
  • 11
  • 87
  • 120
41
votes
4 answers

NLTK WordNet Lemmatizer: Shouldn't it lemmatize all inflections of a word?

I'm using the NLTK WordNet Lemmatizer for a Part-of-Speech tagging project by first modifying each word in the training corpus to its stem (in place modification), and then training only on the new corpus. However, I found that the lemmatizer is not…
sanjeev mk
  • 4,276
  • 6
  • 44
  • 69
40
votes
4 answers

How to tweak the NLTK sentence tokenizer

I'm using NLTK to analyze a few classic texts and I'm running in to trouble tokenizing the text by sentence. For example, here's what I get for a snippet from Moby Dick: import nltk sent_tokenize =…
Chris Wilson
  • 6,599
  • 8
  • 35
  • 71
39
votes
7 answers

How do I do dependency parsing in NLTK?

Going through the NLTK book, it's not clear how to generate a dependency tree from a given sentence. The relevant section of the book: sub-chapter on dependency grammar gives an example figure but it doesn't show how to parse a sentence to come up…
MrD
  • 2,405
  • 3
  • 22
  • 23
39
votes
5 answers

Computational Complexity of Self-Attention in the Transformer Model

I recently went through the Transformer paper from Google Research describing how self-attention layers could completely replace traditional RNN-based sequence encoding layers for machine translation. In Table 1 of the paper, the authors compare the…
39
votes
22 answers

Code Golf: Number to Words

The code golf series seem to be fairly popular. I ran across some code that converts a number to its word representation. Some examples would be (powers of 2 for programming fun): 2 -> Two 1024 -> One Thousand Twenty Four 1048576 -> One Million…
Jason Z
  • 13,122
  • 15
  • 50
  • 62
38
votes
7 answers

Is there a natural language parser for date/times in javascript?

Is there a natural language parser for date/times in javascript?
antony.trupe
  • 10,640
  • 10
  • 57
  • 84
37
votes
8 answers

Efficiently count word frequencies in python

I'd like to count frequencies of all words in a text file. >>> countInFile('test.txt') should return {'aaa':1, 'bbb': 2, 'ccc':1} if the target text file is like: # test.txt aaa bbb ccc bbb I've implemented it with pure python following some…
Light Yagmi
  • 5,085
  • 12
  • 43
  • 64
37
votes
6 answers

How best to parse a simple grammar?

Ok, so I've asked a bunch of smaller questions about this project, but I still don't have much confidence in the designs I'm coming up with, so I'm going to ask a question on a broader scale. I am parsing pre-requisite descriptions for a course…
Nick Heiner
  • 119,074
  • 188
  • 476
  • 699