Questions tagged [part-of-speech]

Linguistic category of words

In grammar, a part of speech (also a word class, a lexical class, or a lexical category) is a linguistic category of words (or more precisely lexical items), which is generally defined by the syntactic or morphological behaviour of the lexical item in question.

From http://en.wikipedia.org/wiki/Parts_of_speech

194 questions
1
vote
1 answer

Distance of Noun from Verb

Is there a way to get the distance of a Noun from the Verb from multiple sentences in a csv file using NLTK and Python? Example of sentences in a .csv file: video shows adam stabbing the bystander. woman quickly ran from the police after the…
Beginner
  • 89
  • 7
1
vote
0 answers

POS Tagger for declension of german words in Java

The RFTagger is a Part-Of-Speech Tagger with very detailed tags for german words. According to their website, output looks like this: word part of…
MK2112
  • 13
  • 1
  • 5
1
vote
1 answer

Why NLTK's Wordnet Lemmatizer Does Not Lemmatize Adverbs and Adjectives?

As I learned, we can do a better job on lemmatization if we identify corresponding PoS tags to each token and then try lemmatizing by setting arugments to lemmatize not only verb, noun but also adjective and adverbs forms. So I've had these lines of…
Todd
  • 399
  • 3
  • 18
1
vote
1 answer

How to generate sequences from a grammar in (pure) R?

I would like to generate a list of part of speech sequences from a (context-free) grammar written in Backus-Naur form (https://en.wikipedia.org/wiki/Backus%E2%80%93Naur_form). I tested the package gramEvol and the functions 'CreateGrammar' and…
1
vote
1 answer

How to change the output of lemmatized words that are presented as list comprehension?

I am struggling with my lemmatization approach because the output provides the following: Row Lemmatized 1 [i, , b, e, , o, k, a, y, , ... While I wanted the output to look like the following so that I can extract into a CSV file and…
Louise
  • 83
  • 5
1
vote
0 answers

Finding X number of most frequent Nouns in part-of-speech (PoS) column in dataframe

I have a PoS column that has labelled words as nouns, adjectives or verbs. My current code extracts all the noun words and stores them in a new column of the dataframe: import pandas as pd from nltk.tag import pos_tag data = {'comments':['Daniel is…
RDTJr
  • 185
  • 1
  • 9
1
vote
2 answers

Join multiple values into same cell R

I have a data frame with pos values for each document split down into single tokens. How can I merge the individual pos values into one single cell separated by a comma? So now I have something like doc_id sentence_id token_id token pos…
Mary
  • 153
  • 1
  • 2
  • 10
1
vote
1 answer

POS-tagging a sentence using NLTK

I would like to pos tag a sentence using the NLTK library in python. I am using the following couple of lines of code and it works fine: >>> text = word_tokenize("And now for something completely different") >>> nltk.pos_tag(text) [('And', 'CC'),…
1
vote
1 answer

Log probability in the Viterbi algorithm (handling zero probabilities)

I am coding a probabilistic part of speech tagger in Python using the Viterbi algorithm. In this context, the Viterbi probability at time t is the product of the Viterbi path probability from the previous time step t-1, the transition probability…
1
vote
1 answer

How to get the infinitive form of the verb using "stanza"?

How to find out the infinitive verbs in a sentence using stanza? Example: doc = "I need you to find the verbes in this sentence" en_nlp = stanza.Pipeline('en', processors='tokenize,lemma,mwt,pos,depparse', verbose=False, use_gpu=False) processed =…
Belkacem Thiziri
  • 605
  • 2
  • 8
  • 31
1
vote
0 answers

Extract pair ( VERB-Noun) from list

I want to extract pair (verb-noun) from each row i want to add other column and put all pair there im using Eron dataset i worked the first part preprocessing (remove number, punctuation...) and know i want to detect (verb-noun) any help…
1
vote
0 answers

Hidden Markov Model (HMM) vs Maximum Entropy Markov Model (MEMM)

what are the advantages and the weakness of both systems in the Part of Speech tagging? Since the HMM use an accurate formula to calculate the likelihood and the probability that a certain tag appears, can it be considerate more efficient?
1
vote
1 answer

How to Extract subject Verb Object using NLP Java? for every sentence

I want to find a subject, verb, and object for each sentence and then it will be passed to natural language generation library simpleNLG to form a sentence. I tried multiple libraries like Cornlp, opennlp, Standford parsers. But I can not find them…
1
vote
0 answers

Can I get subject verb object in string format using NLP in java?

This is sample code: LexicalizedParser lp = **new LexicalizedParser("englishPCFG.ser.gz");** String[] sent = { "This", "is", "an", "easy", "sentence", "." }; Tree parse = (Tree)…
Arjun
  • 21
  • 4
1
vote
1 answer

How to apply nltk.pos_tag on pyspark dataframe

I'm trying to apply pos tagging on one of my tokenized column called "removed" in pyspark dataframe. I'm trying with nltk.pos_tag(df_removed.select("removed")) But all I get is Value Error: ValueError: Cannot apply 'in' operator against a column:…
milva
  • 151
  • 1
  • 2
  • 9