Questions tagged [nlp]

Natural language processing (NLP) is a subfield of artificial intelligence that involves transforming or extracting useful information from natural language data. Methods include machine-learning and rule-based approaches.

Natural language processing (NLP) is a subfield of artificial intelligence that involves transforming or extracting useful information from natural language data. Methods include machine-learning and rule-based approaches. It is often regarded as the engineering arm of Computational Linguistics.

NOTE: If you want to use this tag for a question not directly concerning implementation, then consider posting on Data Science, or Artificial Intelligence instead; otherwise you're probably off-topic. Please choose one site only and do not cross-post to more than one - see Is cross-posting a question on multiple Stack Exchange sites permitted if the question is on-topic for each site? (tl;dr: no).

NLP tasks

Beginner books on Natural Language Processing

Popular software packages

20185 questions
5
votes
2 answers

Building a lemmatizer: speed optimization

I am building a lemmatizer in python. As I need it to run in realtime/process fairly large amount of data the processing speed is of the essence. Data: I have all possible suffixes that are linked to all wordtypes that they can be combined with.…
root
  • 76,608
  • 25
  • 108
  • 120
5
votes
3 answers

Discover user behind multiple different user accounts according to words he uses

I would like to create algorithm to distinguish the persons writing on forum under different nicknames. The goal is to discover people registring new account to flame forum anonymously, not under their main account. Basicaly I was thinking about…
Martin Nuc
  • 5,604
  • 2
  • 42
  • 48
5
votes
1 answer

CJK Languages Pronunciation APIs

Are there any good (preferably open) APIs or databases of pronunciation audio files for Chinese/Japanese/Korean languages? I’ve been looking around, but somehow couldn’t find anything other than Forvo or Google Translate. Both are an overkill for…
Arnold
  • 2,390
  • 1
  • 26
  • 45
5
votes
0 answers

How to implement LSA (Latent semantic analysis) in Python?

How to implement Latent semantic analysis in Python and compare corps of text against query using Cosine similarity ?
ChamingaD
  • 2,908
  • 8
  • 35
  • 58
5
votes
1 answer

Python vs Java for natural language processing

I have been working on java to find the similarity between two documents. I prefer finding semantic similarity , but havent made efforts to find it yet . I am using the following approach . Extract terms / tokens (I am using JAWS with wordnet to…
CTsiddharth
  • 907
  • 12
  • 21
5
votes
3 answers

NLP text tagging

I am a newbie in NLP, just doing it for the first time. I am trying to solve a problem. My problem is I have some documents which are manually tagged like: doc1 - categoryA, categoryB doc2 - categoryA, categoryC doc3 - categoryE, categoryF,…
user1168811
  • 51
  • 1
  • 2
5
votes
1 answer

Using context to improve part-of-speech tagging

Are there some common or recommended techniques for using the context of word to improve the accuracy of part-of-speech tagging? For example if I had the sentence: I played golf on a links. The word "links" could be either singular (a golf course)…
Chris Sears
  • 6,502
  • 5
  • 32
  • 35
5
votes
2 answers

Techniques for calculating adjective frequency

I need to calculate word frequencies of a given set of adjectives in a large set of customer support reviews. However I don't want to include those that are negated. For example suppose my list of adjectives was: [helpful, knowledgeable, friendly].…
awinbra
  • 694
  • 1
  • 7
  • 19
5
votes
4 answers

Find subject in incomplete sentence with NLTK

I have a list of products that I am trying to classify into categories. They will be described with incomplete sentences like: "Solid State Drive Housing" "Hard Drive Cable" "1TB Hard Drive" "500GB Hard Drive, Refurbished from Manufacturer" How can…
Jmjmh
  • 2,016
  • 1
  • 13
  • 11
5
votes
4 answers

Convert one-document-per-line to Blei's lda-c/dtm format for topic modeling?

I am doing Latent Dirichlet Analyses for some research and keep running into a problem. Most lda software requires documents to be in doclines format, meaning a CSV or other delimited file in which each line represents the entirety of a document.…
user836015
5
votes
1 answer

Multitask learning

Can anybody please explain multitask learning in simple and intuitive way? May be some real world problem would be useful.Mostly, these days i am seeing many people are using it for natural language processing tasks.
thetna
  • 6,903
  • 26
  • 79
  • 113
5
votes
2 answers

Automatic semantic role labeling in FrameNet

I would like to do automatic semantic role labeling in FrameNet Lexicon using some machine learning methods. Could you please suggest me some java packages most suitable for this project?
thetna
  • 6,903
  • 26
  • 79
  • 113
5
votes
2 answers

python module to remove internet jargon/slang/acronym

Is there any python module (may be in nltk python) to remove internet slang/ chat slang like "lol","brb" etc. If not can some one provide me a CSV file comprising of such vast list of slang? The website http://www.netlingo.com/acronyms.php gives…
Rkz
  • 1,237
  • 5
  • 16
  • 30
5
votes
3 answers

Running CRFSuite examples

I'm trying to use CRFSuite but I can't figure out how to use the example/ner.py and pos.py Precisely, how do I make an input of the form: # Ner.py fields = 'y w pos chk' or # Pos.py fields = 'w num cap sym p1 p2 p3 p4 s1 s2 s3 s4 y' The "y w pos"…
user1079319
  • 51
  • 1
  • 3
5
votes
1 answer

Could you recommend a NLP toolkit in Prolog?

I need to parse or tokenize English sentences. Is there any NLP toolkit in Prolog? Thanks.
question
  • 487
  • 3
  • 8
  • 13
1 2 3
99
100