Questions tagged [pos-tagger]

A part-of-speech tagger, or POS tagger, is a concrete implementation of algorithms which associate discrete terms, as well as hidden parts of speech, in accordance with a set of descriptive tags, such as the identification of words as nouns, verbs, adjectives, adverbs, and so on. It often follows an approach based on Machine Learning (ML) techniques.

In corpus linguistics, part-of-speech tagging (POS tagging or POST), also called grammatical tagging or word-category disambiguation, is the process of marking up a word in a text (corpus) as corresponding to a particular part of speech, based on both its definition, as well as its context—i.e. relationship with adjacent and related words in a phrase, sentence, or paragraph. A simplified form of this is commonly taught to school-age children, in the identification of words as nouns, verbs, adjectives, adverbs, etc.

Once performed by hand, POS tagging is now done in the context of computational linguistics, using algorithms which associate discrete terms, as well as hidden parts of speech, in accordance with a set of descriptive tags. POS-tagging algorithms fall into two distinctive groups: rule-based and stochastic. E. Brill's tagger, one of the first and most widely used English POS-taggers, employs rule-based algorithms.

586 questions
5
votes
4 answers

spaCy Alternatives in Java

I currently use spaCy to traverse the dependency tree, and generate entities. nlp = get_spacy_model(detect_lang(unicode_text)) doc = nlp(unicode_text) entities = set() for sentence in doc.sents: # traverse tree picking up entities for token…
vin
  • 960
  • 2
  • 14
  • 28
5
votes
2 answers

How to NER and POS tag a pre-tokenized text with Stanford CoreNLP?

I'm using the Stanford's CoreNLP Named Entity Recognizer (NER) and Part-of-Speech (POS) tagger in my application. The problem is that my code tokenizes the text beforehand and then I need to NER and POS tag each token. However I was only able to…
Jack Twain
  • 6,273
  • 15
  • 67
  • 107
5
votes
1 answer

How to convert text file to CoNLL format for malt parser?

I'm trying to use malt parser with the pre made english model. However, I do not know how to convert a text corpus of English sentences into the CoNLL format that is necessary for Malt Parser to operate on. I could not find any documentation on the…
jeffrey
  • 3,196
  • 7
  • 26
  • 44
5
votes
3 answers

Pos tagging german texts using NLTK

I want to use NLTK to POS tag german texts. I found some references on the web, but most of the are outdated. Some reference for example a "EUROPARL" thesaurus, but it looks like only "EUROPARL_raw" is still available. And that one is not POS…
Achim
  • 15,415
  • 15
  • 80
  • 144
5
votes
0 answers

Error in Parts of Speech Tagging using openNLP

I have an Ubuntu Quantal 12.10 Server 64-bit instance. I am using openNLP for POS Tagging of sentences. I am using POS tagging using openNLP with “Parallel Lapply setup”. It is running fine in RStudio environment. But in Ubuntu environment it is…
Siddharth
  • 51
  • 2
5
votes
1 answer

Stanford POS Tagger not tagging Chinese text

I'm using Stanford POS Tagger (for the first time) and while it tags English correctly, it does not seem to recognize (Simplified) Chinese even when changing the model parameter. Have I overlooked something? I've downloaded and unpacked the latest…
Ryan Rapp
  • 1,583
  • 13
  • 18
5
votes
3 answers

TreeTagger installation successful but cannot open .par file

Do anyone know how to resolve this file reading error in TreeTagger that is a common Natural Language Processing tool used to POS tag, lemmatize and chunk sentences? alvas@ikoma:~/treetagger$ echo 'Hello world!' | cmd/tree-tagger-english …
alvas
  • 115,346
  • 109
  • 446
  • 738
4
votes
1 answer

Identify Location Within the Sentence where the Missing Word Belongs

I have the code below: import nltk exampleArray = ['The dog barking'] def processLanguage(): for item in exampleArray: tokenized = nltk.word_tokenize(item) tagged = nltk.pos_tag(tokenized) …
alyssaeliyah
  • 2,214
  • 6
  • 33
  • 80
4
votes
1 answer

How to retrieve the main intent of a sentence using spacy or nltk?

I have a use case where I want to extract main meaningful part of the sentence using spacy or nltk or any NLP libraries. Example sentence1: "How Can I raise my voice against harassment" Intent would be: "raise voice against harassment" Example…
4
votes
2 answers

Extracting only nouns from list of lists pos_tag sequence?

I am trying to extract only nouns using the nltk.pos_tag(), from a list of lists text sequence. I am able to extract all the nouns from the nltk.pos_tag() list, without preserving the list of lists sequence? How to achieve this by preserving the…
M S
  • 894
  • 1
  • 13
  • 41
4
votes
1 answer

Does anyone know how to configure the hunpos wrapper class on nltk?

i've tried the following code and installed from http://code.google.com/p/hunpos/downloads/list english-wsj-1.0 hunpos-1.0-linux.tgz i've extracted the file onto '~/' directory and when i tried the following python code: import nltk from…
alvas
  • 115,346
  • 109
  • 446
  • 738
4
votes
3 answers

Error while loading a tagger model (probably missing model file)

I am trying to implement the below code: import java.util.Properties; import edu.stanford.nlp.coref.CorefCoreAnnotations; import edu.stanford.nlp.coref.CorefCoreAnnotations; import edu.stanford.nlp.coref.data.CorefChain; import…
user3251664
  • 441
  • 2
  • 7
  • 11
4
votes
1 answer

NLTK pos_tag module returns LookupError

The details are on the above. I run it on Jupiter notebook, and get the error message.
xixixixi
  • 253
  • 1
  • 4
  • 8
4
votes
1 answer

Modern dependency parser for Russian

Is there any modern part-of-speech tagger + dependency parser for Russian language? I need a tool or service that will be able to process plain text and output: division into sentences division into tokens part-of-speech tags (fine-grained MSD tags…
adam.ra
  • 1,068
  • 1
  • 10
  • 16
4
votes
2 answers

What is the most fast and accurate POS Tagger in Python (with a commercial license)?

Which POS tagger is fast and accurate and has a license that allows it to be used for commercial needs? For testing, I used Stanford POS which works well but it is slow and I have a license problem.
Regina
  • 115
  • 4
  • 13