I'm working on text that has character combinations like "3/8" and "5/8" when referring to particular sizes of things and I'm making bigrams to help analyze the text. I'd like to not have the "/" character removed but am not finding a way to do that. Here is an example:
library(tidyverse)
library(tidytext)
tibble(text="My example is 3/8 pipe and 5/8 wrench") %>%
unnest_tokens(bigrams,text,token="ngrams",n=2)
Here is the output:
# A tibble: 9 x 1
bigrams
<chr>
1 my example
2 example is
3 is 3
4 3 8
5 8 pipe
6 pipe and
7 and 5
8 5 8
9 8 wrench
Thank you for your input.
Edit: I've found one way around this, but it is crude and would love to hear more elegant solutions.
library(tidyverse)
library(tidytext)
library(stringr)
tibble(text="My example is 3/8 pipe and 5/8 wrench") %>%
mutate(text=str_replace_all(text,"\\/","forwardslash")) %>%
unnest_tokens(bigrams,text,token="ngrams",n=2) %>%
mutate(bigrams=str_replace_all(bigrams,"forwardslash","/"))
Output:
# A tibble: 7 x 1
bigrams
<chr>
1 my example
2 example is
3 is 3/8
4 3/8 pipe
5 pipe and
6 and 5/8
7 5/8 wrench