Questions tagged [pos-tagger]

A part-of-speech tagger, or POS tagger, is a concrete implementation of algorithms which associate discrete terms, as well as hidden parts of speech, in accordance with a set of descriptive tags, such as the identification of words as nouns, verbs, adjectives, adverbs, and so on. It often follows an approach based on Machine Learning (ML) techniques.

In corpus linguistics, part-of-speech tagging (POS tagging or POST), also called grammatical tagging or word-category disambiguation, is the process of marking up a word in a text (corpus) as corresponding to a particular part of speech, based on both its definition, as well as its context—i.e. relationship with adjacent and related words in a phrase, sentence, or paragraph. A simplified form of this is commonly taught to school-age children, in the identification of words as nouns, verbs, adjectives, adverbs, etc.

Once performed by hand, POS tagging is now done in the context of computational linguistics, using algorithms which associate discrete terms, as well as hidden parts of speech, in accordance with a set of descriptive tags. POS-tagging algorithms fall into two distinctive groups: rule-based and stochastic. E. Brill's tagger, one of the first and most widely used English POS-taggers, employs rule-based algorithms.

586 questions
3
votes
0 answers

Restricting Stanford CoreNLP's Set of Phrase-Level Tags

Piggybacking on the question I posted here, I would like to ask if it is possible to rule out certain phrase-level tags when parsing. Specifically, I am using the Stanford CorenNLP version 3.9.2 Shift-Reduce parser (for its constituency-style…
3
votes
4 answers

python text processing: identify nouns from individual words

I have a list of words and would like to keep only nouns. This is not a duplicate of Extracting all Nouns from a text file using nltk In the linked question a piece of text is processed. The accepted answer proposes a tagger. I'm aware of the…
lhk
  • 27,458
  • 30
  • 122
  • 201
3
votes
2 answers

string index out of range in POS tagging

I am doing POS tagging using nltk package in python. Now it's showing error string index out of range even though my string not much big. import nltk sample_list = ['', 'emma', 'jane', 'austen', '1816', '', 'volume', 'chapter', 'emma', 'woodhouse',…
Ravi kant Gautam
  • 333
  • 2
  • 23
3
votes
1 answer

Baum-Welch algorithm for pos tagger

everyone. I'm using the Baum-Welch algorithm to train a pos tagger,it is totally in the unsupervised way. Here comes the problem: When i get the label result, I only get a sequence of numbers. I can't figure out which label stands for VV,NN,DT. How…
David
  • 91
  • 1
  • 7
3
votes
0 answers

Correct way to use pos_tagger option in gensim + keywords extraction

While using "keywords()" in summarization/keywords.py file, I am getting the same set of tags, no matter what value I choose for pos_tagger=['NN'], ['JJ'] or ['NN','JJ'] from gensim.summarization import keywords import…
Nandani
  • 111
  • 1
  • 6
3
votes
1 answer

Is it possible to modify and run only part of a Python program without having to run all of it again and again?

I have written a Python code to train Brill Tagger from NLTK library on some 8000 English sentences and tag some 2000 sentences. The Brill Tagger takes many, many hours to train and finally when it finished training, the last statement of the…
singhuist
  • 302
  • 1
  • 6
  • 17
3
votes
0 answers

How to define and understand rule and template in brill part of speech tagger?

I am trying to get my hands dirty on nltk parts of speech tagging. I am using brill tagger, which creates series of rules. My templates are as follows :- templates = [ Template(Pos(1,1)), Template(Pos(2,2)), Template(Pos(1,2)), …
Mangu Singh Rajpurohit
  • 10,806
  • 4
  • 68
  • 97
3
votes
2 answers

Evaluating POS tagger in NLTK

I want to evaluate different POS tags in NLTK using a text file as an input. For an example, I will take Unigram tagger. I have found how to evaluate Unigram tag using brown corpus. from nltk.corpus import brown import nltk brown_tagged_sents =…
Yash
  • 245
  • 1
  • 7
  • 19
3
votes
0 answers

Hunspell Part-Of-Speech tagger?

Is there a way to use Hunspell as a Part-Of-Speech tagger? It's for use with C++, if Hunspell can't we'll use LanguageTool, but it involve a JVM.
VNourdin
  • 99
  • 10
3
votes
0 answers

OpenNLP Parser tree result

I use OpenNLP to parser some medical report but one of the Parser tree result draw my attention. The original line is as follow: "They are replaced by tumour tissue, which show glandular differentiation." The Parser tree is looks like this (TOP (S…
3
votes
1 answer

NLTK Perceptron Tagger - What does it recognize as FW (foreign word)?

Relatively new to NLP and working on tagging sentences that contain foreign words using NLTK's PerceptronTagger (in Python) - but it continues to tag the tokenized foreign word by position in the syntax rather than as a 'FW'. Does the whole…
Ksofiac
  • 382
  • 1
  • 6
  • 21
3
votes
2 answers

Does anyone know of a good quick and dirty text / grammar parser?

I have a "mad lib" scenario in which I want to a) determine the parts of speech of every (or most) words in a sentence b) have the user select alternatives to those words - or replace them computationally with equivalent words I looked at the…
Dave Edelhart
  • 1,051
  • 1
  • 9
  • 13
3
votes
2 answers

Where in the CoreNLP code are the Penn Treebank part-of-speech symbols themselves actually represented?

I'm looking specifically for some data structure, enum, or generative process through which the different parts-of-speech are represented internally. I've spent a long time scanning the Javadoc and the source code for a while and can't find what I'm…
David Kriz
  • 55
  • 6
3
votes
0 answers

Title (Mr., Mrs., etc.) Inconsistencies with Stanford NER Tagger

I have been working with Stanford's Named Entity Recognition (NER) tagger (http://nlp.stanford.edu/software/CRF-NER.shtml) in Java and Python, and I've stumbled on an inconsistency that I cannot solve. Here is the sentence I'm using as an…
user1895076
  • 709
  • 8
  • 19
3
votes
2 answers

How to keep only the noun words in a wordlist? python NLTK

I have a wordlist, which consists many subjects. The subjects were auto extracted from sentences. I would like to keep only the noun from the subjects. As u can see some of the subjects have adj which i want to delete…
bob90937
  • 553
  • 1
  • 5
  • 18