When reading a docx file with read_docx the line breaks (soft returns) within paragraphs in the docx file are not read, i.e. disappeared. Is it possible to read the doxc and preserve the line breaks?
Asked
Active
Viewed 35 times
1 Answers
0
I have been able to read the content of a word file with the line breaks :
library(RDCOMClient)
wordApp <- COMCreate("Word.Application")
wordApp[["Visible"]] <- TRUE
wordApp[["DisplayAlerts"]] <- FALSE
path_To_Word_File <- "D:\\text.docx"
doc <- wordApp[["Documents"]]$Open(normalizePath(path_To_Word_File), ConfirmConversions = FALSE)
doc_Selection <- wordApp$Selection()
list_Text <- list()
for(i in 1 : 40)
{
print(i)
error_Term <- tryCatch(wordApp[["ActiveDocument"]]$ActiveWindow()$Panes(1)$Pages(1)$Rectangles(i)$Range()$Select(),
error = function(e) NA)
list_Text[[i]] <- tryCatch(doc_Selection$Range()$Text(), error = function(e) NA)
if(!is.null(error_Term))
{
break
}
}
list_Text
[[1]]
[1] "hi\r"
[[2]]
[1] "\r"
[[3]]
[1] "this is a good text\r"
[[4]]
[1] "\r"
[[5]]
[1] "\r"
[[6]]
[1] "\r"
[[7]]
[1] "here is a word document\r"
[[8]]
[1] "here is a word document\r"

Emmanuel Hamel
- 1,769
- 7
- 19