I am trying to import content of multiple word documents into the same object in R. I am following Julia Silge and David Robinson's guide (see here: https://www.tidytextmining.com/usenet.html).
I am unable to figure out how to encode "text" column correctly while importing.
Here is the code I am using:
# Define a function to read all files from a folder into a data frame
read_folder <- function(infolder) {
tibble(file = dir(infolder, full.names = TRUE)) %>%
mutate(text = map(file, read_lines)) %>%
transmute(id = basename(file), text) %>%
unnest(text)
}
# Use unnest() and map() to apply read_folder to each subfolder
raw_text <- tibble(folder = dir(training_folder, full.names = TRUE)) %>%
unnest(map(folder, read_folder)) %>%
transmute(newsgroup = basename(folder), id, text)
Here is an example of the resulting text column:
<f7><e5><95><e3><a9>O<af><a5><fa> PK
Will I have to change the encoding after importing the data?