Questions tagged [pos-tagger]

A part-of-speech tagger, or POS tagger, is a concrete implementation of algorithms which associate discrete terms, as well as hidden parts of speech, in accordance with a set of descriptive tags, such as the identification of words as nouns, verbs, adjectives, adverbs, and so on. It often follows an approach based on Machine Learning (ML) techniques.

In corpus linguistics, part-of-speech tagging (POS tagging or POST), also called grammatical tagging or word-category disambiguation, is the process of marking up a word in a text (corpus) as corresponding to a particular part of speech, based on both its definition, as well as its context—i.e. relationship with adjacent and related words in a phrase, sentence, or paragraph. A simplified form of this is commonly taught to school-age children, in the identification of words as nouns, verbs, adjectives, adverbs, etc.

Once performed by hand, POS tagging is now done in the context of computational linguistics, using algorithms which associate discrete terms, as well as hidden parts of speech, in accordance with a set of descriptive tags. POS-tagging algorithms fall into two distinctive groups: rule-based and stochastic. E. Brill's tagger, one of the first and most widely used English POS-taggers, employs rule-based algorithms.

586 questions
7
votes
2 answers

Can't make Stanford POS tagger working in nltk

I'm trying to work with Stanford POS tagger within NLTK. I'm using the example shown here: http://www.nltk.org/api/nltk.tag.html#module-nltk.tag.stanford I'm able to load everything smoothly: >>> import os >>> from nltk.tag import…
Miguel
  • 2,738
  • 3
  • 35
  • 51
7
votes
3 answers

POS-Tagger is incredibly slow

I am using nltk to generate n-grams from sentences by first removing given stop words. However, nltk.pos_tag() is extremely slow taking up to 0.6 sec on my CPU (Intel i7). The output: ['The first time I went, and was completely taken by the live…
Stefan Falk
  • 23,898
  • 50
  • 191
  • 378
7
votes
3 answers

How to use OpenNLP to get POS tags in R?

Here is the R Code: library(NLP) library(openNLP) tagPOS <- function(x, ...) { s <- as.String(x) word_token_annotator <- Maxent_Word_Token_Annotator() a2 <- Annotation(1L, "sentence", 1L, nchar(s)) a2 <- annotate(s, word_token_annotator, a2) a3 <-…
user4599
  • 95
  • 1
  • 1
  • 9
7
votes
4 answers

Tagging a single word with the nltk pos tagger tags each letter instead of the word

I'm try to tag a single word with the nltk pos tagger: word = "going" pos = nltk.pos_tag(word) print pos But the output is this: [('g', 'NN'), ('o', 'VBD'), ('i', 'PRP'), ('n', 'VBP'), ('g', 'JJ')] It's tagging each letter rather than just the one…
jksnw
  • 648
  • 1
  • 7
  • 19
7
votes
1 answer

Why does the Penn Treebank POS tagset have a separate tag for the word 'to'?

The Penn Treebank tagset has a separate tag TO for the word 'to', irrespective of whether it's used in the preposition sense (such as I went to school) or the infinitive sense (such as I want to eat). What purpose does this serve from an overall NLP…
Sagar Ahire
  • 187
  • 6
7
votes
2 answers

How do I do use non-integer string labels with SVM from scikit-learn? Python

Scikit-learn has fairly user-friendly python modules for machine learning. I am trying to train an SVM tagger for Natural Language Processing (NLP) where my labels and input data are words and annotation. E.g. Part-Of-Speech tagging, rather than…
alvas
  • 115,346
  • 109
  • 446
  • 738
6
votes
1 answer

How can I convert CLAWS7 tags to Penn tags?

Does anyone of you know a way to convert a tag from CLAWS7 tagset to it's equivalent in Penn tagset? CLAWS7 tagset: http://ucrel.lancs.ac.uk/claws7tags.html Penn tagset: http://www.mozart-oz.org/mogul/doc/lager/brill-tagger/penn.html
Amin Y
  • 701
  • 1
  • 9
  • 15
6
votes
1 answer

Correct POS tags for numbers substituted with ## in spacy

The gigaword dataset is a huge corpus used to train abstractive summarization models. It contains summaries like these: spain 's colonial posts #.## billion euro loss taiwan shares close down #.## percent I want to process these summaries with…
Pyfisch
  • 1,752
  • 1
  • 17
  • 29
6
votes
1 answer

How to use Keras to build a Part-of-Speech tagger?

I'm trying to implement a Part-of-Speech tagger using neural network with the help of Keras. I'm using a Sequential model, and training data from NLTK's Penn Treebank Corpus(i.e. from nltk.corpus import treebank). According to my understanding, to…
6
votes
1 answer

How to do POS tagging using SVM in Python?

I want to do POS tagging using SVM with non-English corpus in Python. It looks like Python does not support tagging using SVM yet (http://www.nltk.org/_modules). scikit-learn has a SVM module. So I installed scikit-learn and use it in Python but I…
Sam Black
  • 371
  • 5
  • 19
6
votes
1 answer

Getting additional information (Active/Passive, Tenses ...) from a Tagger

I'm using the Stanford Tagger for determining the Parts of Speech. However, I want to get more information out of the text. Is there a possibility to get further information like the tense of the sentence or if it is in active/passive? So far, I'm…
David Müller
  • 5,291
  • 2
  • 29
  • 33
6
votes
2 answers

NLTK POS tagger not working

If I try this : import nltk text = nltk.word_tokenize("And now for something completely different") nltk.pos_tag(text) Output: Traceback (most recent call last): File "C:/Python27/pos.py", line 3, in nltk.pos_tag(text) File…
Vinit Gaikwad
  • 329
  • 9
  • 21
6
votes
1 answer

Error when using stanford tagger in python

This is my code and the error message: >>> from nltk.tag.stanford import StanfordTagger >>> st = StanfordTagger('bidirection-distsim-wsj-0-18.tagger') Traceback (most recent call last): File "", line 1, in File…
user1839641
  • 61
  • 1
  • 4
5
votes
2 answers

Finding the position of Noun and Verb in a sentence Python

Is there a way to find the position of the words with pos-tag 'NN' and 'VB' in a sentence in Python? example of a sentences in a csv file: "Man walks into a bar." "Cop shoots his gun." "Kid drives into a ditch"
Beginner
  • 89
  • 7
5
votes
2 answers

Trying to use MEGAM as an NLTK ClassifierBasedPOSTagger?

I am currently trying to build a general purpose (or as general as is practical) POS tagger with NLTK. I have dabbled with the brown and treebank corpora for training, but will probably be settling on the treebank corpus. Learning as I go, I am…
winwaed
  • 7,645
  • 6
  • 36
  • 81
1 2
3
39 40