- Hi guys I'm new to the NLP algorithm with R. I would like to extract a pair ( VERB-Noun) from a pdf? I'm stuck at a frequency of words topic. Like "Represent clients in criminal and civil litigation and other legal proceedings, draw up legal documents, or manage or advise clients on legal transactions. May specialize in a single area or may practice broadly in many areas of law."
- I would like to extract the verb-noun of these. Hou I would do?
Asked
Active
Viewed 35 times
0

waka
- 1
1 Answers
0
> library(udpipe)
> docs <- "Represent clients in criminal and civil litigation and other legal proceedings, draw up legal documents, or manage or advise clients on legal transactions. May specialize in a single area or may practice broadly in many areas of law."
> docs <- setNames(docs, "doc1")
> anno <- udpipe(docs, object = "english", udpipe_model_repo = "bnosac/udpipe.models.ud")
> anno <- cbind_dependencies(anno, type = "parent")
> subset(anno, upos_parent %in% c("NOUN", "VERB") & upos %in% c("NOUN", "VERB"),
+ select = c("doc_id", "paragraph_id", "sentence_id", "token", "token_parent", "dep_rel", "upos", "upos_parent"))
doc_id paragraph_id sentence_id token token_parent dep_rel upos upos_parent
2 doc1 1 1 clients Represent obj NOUN VERB
7 doc1 1 1 litigation Represent obl NOUN VERB
11 doc1 1 1 proceedings litigation conj NOUN NOUN
13 doc1 1 1 draw Represent conj VERB VERB
16 doc1 1 1 documents draw obj NOUN VERB
19 doc1 1 1 manage documents conj NOUN NOUN
21 doc1 1 1 advise clients conj NOUN NOUN
22 doc1 1 1 clients Represent obj NOUN VERB
25 doc1 1 1 transactions clients nmod NOUN NOUN
32 doc1 1 2 area specialize obl NOUN VERB
35 doc1 1 2 practice specialize conj VERB VERB
39 doc1 1 2 areas practice obl NOUN VERB
41 doc1 1 2 law areas nmod NOUN NOUN