I'm working on a code that generates MS Word documents as output. I generated few documents with revised content which I'd like to use as a reference documents in my unit tests. The idea is - generate a document based on input data X -> compare it with a reference document for this data X. I know MS Word documents are XML files under the hood so I thought I can look for differences in XML content. I have something like this:
convert_to_xml <- function(x) {
doc <- officer::read_docx(x)
xml <- doc$doc_obj$get()
xml_out <- tempfile(fileext = ".xml")
xml2::write_xml(xml, file = xml_out)
xml_out
}
test_that("xml content is correct", {
output <- generate_document()
out_xml <- convert_to_xml(output)
ref <- get_reference_xml()
expect_snapshot_file(output, ref)
})
Basically, it works fine. But from merge to merge I see the test fails. Detail examination on why it failed shows that the content is the same, but the binary structure changed. How can I eliminate this? Is there a way You recommend to compare xml files? Should I compare the files differently, not using expect_snapshot_file
?