Questions tagged [pos-tagger]

A part-of-speech tagger, or POS tagger, is a concrete implementation of algorithms which associate discrete terms, as well as hidden parts of speech, in accordance with a set of descriptive tags, such as the identification of words as nouns, verbs, adjectives, adverbs, and so on. It often follows an approach based on Machine Learning (ML) techniques.

In corpus linguistics, part-of-speech tagging (POS tagging or POST), also called grammatical tagging or word-category disambiguation, is the process of marking up a word in a text (corpus) as corresponding to a particular part of speech, based on both its definition, as well as its context—i.e. relationship with adjacent and related words in a phrase, sentence, or paragraph. A simplified form of this is commonly taught to school-age children, in the identification of words as nouns, verbs, adjectives, adverbs, etc.

Once performed by hand, POS tagging is now done in the context of computational linguistics, using algorithms which associate discrete terms, as well as hidden parts of speech, in accordance with a set of descriptive tags. POS-tagging algorithms fall into two distinctive groups: rule-based and stochastic. E. Brill's tagger, one of the first and most widely used English POS-taggers, employs rule-based algorithms.

586 questions
11
votes
3 answers

POS tagging in Scala

I tried to POS tag a sentence in Scala using Stanford parser like below val lp:LexicalizedParser = LexicalizedParser.loadModel("edu/stanford/nlp/models/lexparser/englishPCFG.ser.gz"); lp.setOptionFlags("-maxLength", "50",…
yAsH
  • 3,367
  • 8
  • 36
  • 67
11
votes
0 answers

Stanford Core NLP how to get the probability & margin of error

When using the parser or for the matter any of the Annotation in Core NLP, is there a way to access the probability or the margin of error? To put my question into context, I am trying to understand if there is a way programmatically to detect a…
Chris
  • 111
  • 3
10
votes
1 answer

what is the MeCab output and the tagset?

Can someone enlighten me on the MeCab default output? what annotation does the MeCab output and where can i find the tagset for the morpho analyzer http://mecab.sourceforge.net/ can anyone decipher this output from MeCab? ブギス・ジャンクション ブギス・ジャンクション…
alvas
  • 115,346
  • 109
  • 446
  • 738
10
votes
2 answers

How to POS_TAG a french sentence?

I'm looking for a way to pos_tag a French sentence like the following code is used for English sentences: def pos_tagging(sentence): var = sentence exampleArray = [var] for item in exampleArray: tokenized =…
sahraoui asmoun
  • 285
  • 1
  • 2
  • 10
8
votes
3 answers

How to use pos_tag in NLTK?

So I was trying to tag a bunch of words in a list (POS tagging to be exact) like so: pos = [nltk.pos_tag(i,tagset='universal') for i in lw] where lw is a list of words (it's really long or I would have posted it but it's like [['hello'],['world']]…
SSBakh
  • 1,487
  • 1
  • 14
  • 27
8
votes
2 answers

Extracting noun+noun or (adj|noun)+noun from Text

Is it possible to extract noun+noun or (adj|noun)+noun using the R package openNLP? That is, I would like to use linguistic filtering to extract candidate noun phrases. Could you direct me how to do? Many thanks. Thanks for the responses. here is…
ssuhan
  • 367
  • 1
  • 6
  • 12
8
votes
2 answers

nltk StanfordNERTagger : How to get proper nouns without capitalization

I am trying to use the StanfordNERTagger and nltk to extract keywords from a piece of text. docText="John Donk works for POI. Brian Jones wants to meet with Xyz Corp. for measuring POI's Short Term performance Metrics." words =…
AbtPst
  • 7,778
  • 17
  • 91
  • 172
8
votes
1 answer

c/c++ NLP library

I am looking for an open source Natural Language Processing library for c/c++ and especially i am interested in Part of speech tagging.
Ayoub M.
  • 4,690
  • 10
  • 42
  • 52
8
votes
3 answers

Obtain multiple taggings with Stanford POS Tagger

I'm performing POS tagging with the Stanford POS Tagger. The tagger only returns one possible tagging for the input sentence. For instance, when provided with the input sentence "The clown weeps.", the POS tagger produces the (erroneous) "The_DT…
a3nm
  • 8,717
  • 6
  • 31
  • 39
8
votes
1 answer

Increase performance of Stanford-tagger based program

I just implemented a program that uses the Stanford POS tagger in Java. I used an input file of a few KB in size, consisting of a few hundred words. I even set the heap size to 600 MB. But it is still slow and sometimes runs out of heap memory. How…
Ameer
  • 600
  • 1
  • 12
  • 27
8
votes
3 answers

nltk pos_tag usage

I am trying to use speech tagging in NLTK and have used this command: >>> text = nltk.word_tokenize("And now for something completely different") >>> nltk.pos_tag(text) Traceback (most recent call last): File "", line 1, in…
Ashish Singh
  • 739
  • 3
  • 8
  • 21
7
votes
1 answer

How to obtain better results using NLTK pos tag

I am just learning nltk using Python. I tried doing pos_tag on various sentences. But the results obtained are not accurate. How can I improvise the results ? broke = NN flimsy = NN crap = NN Also I am getting lot of extra words being categorized…
SyncMaster
  • 9,754
  • 34
  • 94
  • 137
7
votes
2 answers

Korean, Thai and Indonesian POS tagger

Can someone recommend an open source POS tagger for Korean, Indonesian, Thai and Vietnamese? That I can use to tag the corpus data that I currently have. (e.g. the stanford-postagger) If you are a dev and care to share and let me test out the POS…
alvas
  • 115,346
  • 109
  • 446
  • 738
7
votes
2 answers

Trying to use HPSG PET Parser

Hi I'm trying to use the PET Parser, but the documentation given for usage is insufficient. Can anyone point me to a good article or tutorial on using PET? Does it support utf-8?
Sharmila
  • 1,637
  • 2
  • 23
  • 30
7
votes
1 answer

Definition of POS tag and Dependency label sets are used within Parsey McParseface?

The POS tags and Depedency labels output by Parsey McParseface are given in the tag-set and label-set files here respectively. The Syntaxnet readme outlines that the model was trained on the Penn Treebank, OntoNotes and the English Web Treebanks.…
1
2
3
39 40