Questions tagged [nlp]

Natural language processing (NLP) is a subfield of artificial intelligence that involves transforming or extracting useful information from natural language data. Methods include machine-learning and rule-based approaches.

NOTE: If you want to use this tag for a question not directly concerning implementation, then consider posting on Data Science, or Artificial Intelligence instead; otherwise you're probably off-topic. Please choose one site only and do not cross-post to more than one - see Is cross-posting a question on multiple Stack Exchange sites permitted if the question is on-topic for each site? (tl;dr: no).

NLP tasks

Text pre-processing
Coreference resolution
Dependency parsing parse-tree
Document summarization summarization
Named entity recognition (NER) named-entity-recognition
Information extraction (IE) information-retrieval information-extraction
Language modeling
Part-of-speech (POS) tagging part-of-speech
Morphological analysis and wordform generation
Phrase-structure (constituency) parsing parse-tree
Machine translation (MT) machine-translation
Question answering (QA) nlp-question-answering
Sentiment analysis sentiment-analysis
Semantic parsing semantic-analysis
Text categorization text-classification document-classification
Textual entailment detection
Topic modeling topic-modeling
Word Sense Disambiguation (WSD) word-sense-disambiguation

Beginner books on Natural Language Processing

Popular software packages

General purpose toolkits
- Natural Language Toolkit (NLTK) (Python) nltk
- OpenNLP (Java) opennlp
- Sharp NLP (.NET) sharpnlp
- ClearNLP (Java) clearnlp
- Mate (Java)
- Stanford CoreNLP (Java) stanford-nlp
- Treat (Ruby)
- Mallet (Java) mallet
- spaCy (Python) spacy
- Pattern (Python) python-pattern
Phrase structure parsers
- Stanford Parser (Java) stanford-nlp
- Berkeley Parser (Java)
- BLLIP (Charniak-Johnson) Parser (C++, Python) charniak-parser
Dependency parsers
- Stanford Dependencies (packaged with Stanford parser) (Java) stanford-nlp
- MaltParser (Java)
- MSTParser (Java)
- UDPipe
Proof reading software
- LanguageTool (Java) languagetool

20185 questions

votes

2 answers

Building a lemmatizer: speed optimization

I am building a lemmatizer in python. As I need it to run in realtime/process fairly large amount of data the processing speed is of the essence. Data: I have all possible suffixes that are linked to all wordtypes that they can be combined with.…

python optimization nlp lemmatization

asked Mar 23 '12 at 17:04

root

76,608
25
108
120

votes

3 answers

Discover user behind multiple different user accounts according to words he uses

I would like to create algorithm to distinguish the persons writing on forum under different nicknames. The goal is to discover people registring new account to flame forum anonymously, not under their main account. Basicaly I was thinking about…

algorithm language-agnostic nlp

asked Mar 18 '12 at 11:26

Martin Nuc

5,604
2
42
48

votes

1 answer

CJK Languages Pronunciation APIs

Are there any good (preferably open) APIs or databases of pronunciation audio files for Chinese/Japanese/Korean languages? I’ve been looking around, but somehow couldn’t find anything other than Forvo or Google Translate. Both are an overkill for…

nlp

asked Mar 14 '12 at 22:02

Arnold

2,390
1
26
45

votes

0 answers

How to implement LSA (Latent semantic analysis) in Python?

How to implement Latent semantic analysis in Python and compare corps of text against query using Cosine similarity ?

python math nlp

asked Feb 13 '12 at 07:44

ChamingaD

2,908
8
35
58

votes

1 answer

Python vs Java for natural language processing

I have been working on java to find the similarity between two documents. I prefer finding semantic similarity , but havent made efforts to find it yet . I am using the following approach . Extract terms / tokens (I am using JAWS with wordnet to…

java python text nlp similarity

asked Feb 13 '12 at 04:53

CTsiddharth

votes

3 answers

NLP text tagging

I am a newbie in NLP, just doing it for the first time. I am trying to solve a problem. My problem is I have some documents which are manually tagged like: doc1 - categoryA, categoryB doc2 - categoryA, categoryC doc3 - categoryE, categoryF,…

machine-learning nlp

asked Jan 25 '12 at 09:33

user1168811

votes

1 answer

Using context to improve part-of-speech tagging

Are there some common or recommended techniques for using the context of word to improve the accuracy of part-of-speech tagging? For example if I had the sentence: I played golf on a links. The word "links" could be either singular (a golf course)…

nlp

asked Jan 20 '12 at 20:50

Chris Sears

6,502
5
32
35

votes

2 answers

Techniques for calculating adjective frequency

I need to calculate word frequencies of a given set of adjectives in a large set of customer support reviews. However I don't want to include those that are negated. For example suppose my list of adjectives was: [helpful, knowledgeable, friendly].…

full-text-search nlp data-mining

asked Jan 16 '12 at 01:31

awinbra

votes

4 answers

Find subject in incomplete sentence with NLTK

I have a list of products that I am trying to classify into categories. They will be described with incomplete sentences like: "Solid State Drive Housing" "Hard Drive Cable" "1TB Hard Drive" "500GB Hard Drive, Refurbished from Manufacturer" How can…

python nlp nltk

asked Jan 12 '12 at 20:08

Jmjmh

2,016
1
13
11

votes

4 answers

Convert one-document-per-line to Blei's lda-c/dtm format for topic modeling?

I am doing Latent Dirichlet Analyses for some research and keep running into a problem. Most lda software requires documents to be in doclines format, meaning a CSV or other delimited file in which each line represents the entirety of a document.…

nlp dataform lda

asked Jan 05 '12 at 22:53

user836015

votes

1 answer

Multitask learning

Can anybody please explain multitask learning in simple and intuitive way? May be some real world problem would be useful.Mostly, these days i am seeing many people are using it for natural language processing tasks.

nlp machine-learning stanford-nlp

asked Dec 31 '11 at 13:10

thetna

6,903
26
79
113

votes

2 answers

Automatic semantic role labeling in FrameNet

I would like to do automatic semantic role labeling in FrameNet Lexicon using some machine learning methods. Could you please suggest me some java packages most suitable for this project?

java nlp machine-learning

asked Dec 18 '11 at 10:09

thetna

6,903
26
79
113

votes

2 answers

python module to remove internet jargon/slang/acronym

Is there any python module (may be in nltk python) to remove internet slang/ chat slang like "lol","brb" etc. If not can some one provide me a CSV file comprising of such vast list of slang? The website http://www.netlingo.com/acronyms.php gives…

python nlp acronym

asked Dec 14 '11 at 09:46

Rkz

1,237
5
16
30

votes

3 answers

Running CRFSuite examples

I'm trying to use CRFSuite but I can't figure out how to use the example/ner.py and pos.py Precisely, how do I make an input of the form: # Ner.py fields = 'y w pos chk' or # Pos.py fields = 'w num cap sym p1 p2 p3 p4 s1 s2 s3 s4 y' The "y w pos"…

python machine-learning nlp crfsuite

asked Dec 03 '11 at 19:33

user1079319

votes

1 answer

Could you recommend a NLP toolkit in Prolog?

I need to parse or tokenize English sentences. Is there any NLP toolkit in Prolog? Thanks.

prolog nlp

asked Dec 02 '11 at 04:50

question

Prev 1 2 3

…

100 Next