Questions tagged [nlp]

Natural language processing (NLP) is a subfield of artificial intelligence that involves transforming or extracting useful information from natural language data. Methods include machine-learning and rule-based approaches.

NOTE: If you want to use this tag for a question not directly concerning implementation, then consider posting on Data Science, or Artificial Intelligence instead; otherwise you're probably off-topic. Please choose one site only and do not cross-post to more than one - see Is cross-posting a question on multiple Stack Exchange sites permitted if the question is on-topic for each site? (tl;dr: no).

NLP tasks

Text pre-processing
Coreference resolution
Dependency parsing parse-tree
Document summarization summarization
Named entity recognition (NER) named-entity-recognition
Information extraction (IE) information-retrieval information-extraction
Language modeling
Part-of-speech (POS) tagging part-of-speech
Morphological analysis and wordform generation
Phrase-structure (constituency) parsing parse-tree
Machine translation (MT) machine-translation
Question answering (QA) nlp-question-answering
Sentiment analysis sentiment-analysis
Semantic parsing semantic-analysis
Text categorization text-classification document-classification
Textual entailment detection
Topic modeling topic-modeling
Word Sense Disambiguation (WSD) word-sense-disambiguation

Beginner books on Natural Language Processing

Popular software packages

General purpose toolkits
- Natural Language Toolkit (NLTK) (Python) nltk
- OpenNLP (Java) opennlp
- Sharp NLP (.NET) sharpnlp
- ClearNLP (Java) clearnlp
- Mate (Java)
- Stanford CoreNLP (Java) stanford-nlp
- Treat (Ruby)
- Mallet (Java) mallet
- spaCy (Python) spacy
- Pattern (Python) python-pattern
Phrase structure parsers
- Stanford Parser (Java) stanford-nlp
- Berkeley Parser (Java)
- BLLIP (Charniak-Johnson) Parser (C++, Python) charniak-parser
Dependency parsers
- Stanford Dependencies (packaged with Stanford parser) (Java) stanford-nlp
- MaltParser (Java)
- MSTParser (Java)
- UDPipe
Proof reading software
- LanguageTool (Java) languagetool

20185 questions

votes

3 answers

What is a good Java library for Parts-Of-Speech tagging?

I'm looking for a good open source POS Tagger in Java. Here's what I have come up with so far. LingPipe Stanford LBJ FastTag Anybody got any recommendations?

java nlp

asked Feb 19 '10 at 02:08

Glenn

7,874
3
29
38

votes

5 answers

How does Amazon's Statistically Improbable Phrases work?

How does something like Statistically Improbable Phrases work? According to amazon: Amazon.com's Statistically Improbable Phrases, or "SIPs", are the most distinctive phrases in the text of books in the Search Inside!™ program. To identify…

algorithm nlp platform-agnostic

asked Jan 05 '10 at 22:13

ʞɔıu

47,148
35
106
149

votes

5 answers

how to determine the number of topics for LDA?

I am a freshman in LDA and I want to use it in my work. However, some problems appear. In order to get the best performance, I want to estimate the best topic number. After reading "Finding Scientific topics", I know that I can calculate logP(w|z)…

nlp data-mining lda

asked Jul 02 '13 at 09:22

Chelsea Wang

votes

1 answer

Pointwise mutual information on text

I was wondering how one would calculate the pointwise mutual information for text classification. To be more exact, I want to classify tweets in categories. I have a dataset of tweets (which are annotated), and I have a dictionary per category of…

statistics machine-learning nlp

asked Nov 21 '12 at 08:06

Olivier_s_j

5,490
24
80
126

votes

12 answers

How can I split a text into sentences using the Stanford parser?

How can I split a text or paragraph into sentences using Stanford parser? Is there any method that can extract sentences, such as getSentencesFromString() as it's provided for Ruby?

java parsing artificial-intelligence nlp stanford-nlp

asked Feb 29 '12 at 02:19

S Gaber

1,536
7
24
43

votes

3 answers

Ease of use: Stanford CoreNLP vs. OpenNLP

I looking to use a suite of NLP tools for a personal project, and I was wondering whether Stanford's CoreNLP is easier to use or OpenNLP. Or is there another free package you would reccomend? I haven't really done any NLP before, so I am looking for…

nlp stanford-nlp

asked Jul 06 '11 at 21:30

Pratik Thaker

votes

4 answers

How to Train GloVe algorithm on my own corpus

I tried to follow this. But some how I wasted a lot of time ending up with nothing useful. I just want to train a GloVe model on my own corpus (~900Mb corpus.txt file). I downloaded the files provided in the link above and compiled it using cygwin…

nlp stanford-nlp gensim word2vec glove

asked Feb 24 '18 at 11:10

Codir

votes

1 answer

Data sets for emotion detection in text

I'm implementing a system that could detect the human emotion in text. Are there any manually annotated data sets available for supervised learning and testing? Here are some interesting datasets: https://dataturks.com/projects/trending

database dataset nlp text-mining emotion

asked Jun 08 '15 at 07:34

ekka

votes

3 answers

Is it possible to train Stanford NER system to recognize more named entities types?

I'm using some NLP libraries now, (stanford and nltk) Stanford I saw the demo part but just want to ask if it possible to use it to identify more entity types. So currently stanford NER system (as the demo shows) can recognize entities as…

nlp stanford-nlp named-entity-recognition

asked Mar 03 '14 at 22:07

JudyJiang

2,207
6
27
47

votes

6 answers

POS tagging in German

I am using NLTK to extract nouns from a text-string starting with the following command: tagged_text = nltk.pos_tag(nltk.Text(nltk.word_tokenize(some_string))) It works fine in English. Is there an easy way to make it work for German as well? (I…

python nlp nltk

asked Oct 28 '09 at 20:17

Johannes Meier

votes

1 answer

Understanding NLTK collocation scoring for bigrams and trigrams

Background: I am trying to compare pairs of words to see which pair is "more likely to occur" in US English than another pair. My plan is/was to use the collocation facilities in NLTK to score word pairs, with the higher scoring pair being the most…

python nlp nltk

asked Dec 30 '11 at 20:09

ccgillett

4,511
4
21
14

votes

3 answers

How to build semantic search for a given domain

There is a problem we are trying to solve where we want to do a semantic search on our set of data, i.e we have a domain-specific data (example: sentences talking about automobiles) Our data is just a bunch of sentences and what we want is to give a…

python elasticsearch nlp sentence-similarity huggingface-transformers

asked Feb 12 '20 at 11:06

Jickson

5,133
2
27
38

votes

1 answer

Parsing city of origin / destination city from a string

I have a pandas dataframe where one column is a bunch of strings with certain travel details. My goal is to parse each string to extract the city of origin and destination city (I would like to ultimately have two new columns titled 'origin' and…

python regex pandas nlp nltk

asked Jan 28 '20 at 20:39

Merv Merzoug

1,149
2
19
33

votes

3 answers

Combining a Tokenizer into a Grammar and Parser with NLTK

I am making my way through the NLTK book and I can't seem to do something that would appear to be a natural first step for building a decent grammar. My goal is to build a grammar for a particular text corpus. (Initial question: Should I even try…

python nlp grammar nltk

asked Feb 01 '11 at 03:06

speedplane

15,673
16
86
138

votes

4 answers

How to speed up Gensim Word2vec model load time?

I'm building a chatbot so I need to vectorize the user's input using Word2Vec. I'm using a pre-trained model with 3 million words by Google (GoogleNews-vectors-negative300). So I load the model using Gensim: import gensim model =…

python nlp gensim word2vec

asked Mar 23 '17 at 20:30

Marcus Holm

Prev 1 2 3

…

99 100 Next