I am trying to tokenize a sentence as follows.
Section <- c("If an infusion reaction occurs, interrupt the infusion.")
df <- data.frame(Section)
When I tokenize using tidytext and the code below,
AA <- df %>%
mutate(tokens = str_extract_all(df$Section, "([^\\s]+)"),
locations = str_locate_all(df$Section, "([^\\s]+)"),
locations = map(locations, as.data.frame)) %>%
select(-Section) %>%
unnest(tokens, locations)
it gives me a result set as below (see image).
How do i get the comma and the period as independent tokens as not part of 'occurs,' and 'infusion.' respectively, using tidytext. so my tokens should be
If
an
infusion
reaction
occurs
,
interrupt
the
infusion
.