In analysing text, it can be useful to identify names of people in text data.
Objects prepackaged in tidytext
include:
- English negators, modals, and adverbs (
nma_words
) - Parts of Speech (
parts_of_speech
) - Sentiments (
sentiments
), and - Stop Words (see:
?stop_words
)
Is there a similar object in R (or in accessible format elsewhere) containing a canonical list of names?
For reference, here are the existing data.frame
s that are supplied with tidytext
nma_words
# # A tibble: 44 x 2
# word modifier
# <chr> <chr>
# 1 cannot negator
# 2 could not negator
# 3 did not negator
# 4 does not negator
# 5 had no negator
# 6 have no negator
# 7 may not negator
# 8 never negator
# 9 no negator
# 10 not negator
# # … with 34 more rows
parts_of_speech
# # A tibble: 208,259 x 2
# word pos
# <chr> <chr>
# 1 3-d Adjective
# 2 3-d Noun
# 3 4-f Noun
# 4 4-h'er Noun
# 5 4-h Adjective
# 6 a' Adjective
# 7 a-1 Noun
# 8 a-axis Noun
# 9 a-bomb Noun
# 10 a-frame Noun
# # … with 208,249 more rows
sentiments
# # A tibble: 6,786 x 2
# word sentiment
# <chr> <chr>
# 1 2-faces negative
# 2 abnormal negative
# 3 abolish negative
# 4 abominable negative
# 5 abominably negative
# 6 abominate negative
# 7 abomination negative
# 8 abort negative
# 9 aborted negative
# 10 aborts negative
# # … with 6,776 more rows
stop_words
# # A tibble: 1,149 x 2
# word lexicon
# <chr> <chr>
# 1 a SMART
# 2 a's SMART
# 3 able SMART
# 4 about SMART
# 5 above SMART
# 6 according SMART
# 7 accordingly SMART
# 8 across SMART
# 9 actually SMART
# 10 after SMART
# # … with 1,139 more rows