0

When reading a docx file with read_docx the line breaks (soft returns) within paragraphs in the docx file are not read, i.e. disappeared. Is it possible to read the doxc and preserve the line breaks?

1 Answers1

0

I have been able to read the content of a word file with the line breaks :

library(RDCOMClient)

wordApp <- COMCreate("Word.Application")
wordApp[["Visible"]] <- TRUE
wordApp[["DisplayAlerts"]] <- FALSE
path_To_Word_File <- "D:\\text.docx"
doc <- wordApp[["Documents"]]$Open(normalizePath(path_To_Word_File), ConfirmConversions = FALSE)
doc_Selection <-  wordApp$Selection()

list_Text <- list()

for(i in 1 : 40)
{
  print(i)
  error_Term <- tryCatch(wordApp[["ActiveDocument"]]$ActiveWindow()$Panes(1)$Pages(1)$Rectangles(i)$Range()$Select(),
                         error = function(e) NA)
  
  list_Text[[i]] <- tryCatch(doc_Selection$Range()$Text(), error = function(e) NA)
  
  if(!is.null(error_Term))
  {
    break
  }
}

list_Text

[[1]]
[1] "hi\r"

[[2]]
[1] "\r"

[[3]]
[1] "this is a good text\r"

[[4]]
[1] "\r"

[[5]]
[1] "\r"

[[6]]
[1] "\r"

[[7]]
[1] "here is a word document\r"

[[8]]
[1] "here is a word document\r"

Emmanuel Hamel
  • 1,769
  • 7
  • 19