I'm using tidytext
(and the tidyverse
) to analyze some text data (as in Tidy Text Mining with R).
My input text file, myfile.txt
, looks like this:
# Section 1 Name
Lorem ipsum dolor
sit amet ... (et cetera)
# Section 2 Name
<multiple lines here again>
with 60 or so sections.
I would like to generate a column section_name
with the strings "Category 1 Name"
or "Category 2 Name"
as values for the corresponding lines. For instance, I have
library(tidyverse)
library(tidytext)
library(stringr)
fname <- "myfile.txt"
all_text <- readLines(fname)
all_lines <- tibble(text = all_text)
tidiedtext <- all_lines %>%
mutate(linenumber = row_number(),
section_id = cumsum(str_detect(text, regex("^#", ignore_case = TRUE)))) %>%
filter(!str_detect(text, regex("^#"))) %>%
ungroup()
which adds a column in tidiedtext
for the corresponding section number for each line.
Is it possible to add a single line to the call to mutate()
to add such a column? Or is there another approach I ought to be using?