0

I want to read specific nodes from multiple XML files recursively (currently 214 files in 186 subfolders, and will continuously change as more folders and files are added) and then convert them to a data frame.

The original question posed is here with the solution provided.

I am now getting to different encoding errors (and I don’t know which of the 214 contain them):

  • Error: 1: Input is not proper UTF-8, indicate encoding ! Bytes: 0x92 0x64 0x61 0x6D
  • Error: 1: Start tag expected, '<' not found

Is there a way that I can circumvent, or better address (prevent) these errors, when converting the values using the function below to a data frame with ldply()?

Also, is there a way I can get a list of the files (folder names) with the encoding issues in R?

Pharny
  • 17
  • 3
  • 1
    Add `print(FileName)` as the first line to your "RI_ID" function the last line printed is the file with the error. Once we have a sample of that file, better able to help solve your problem. – Dave2e Apr 02 '21 at 16:26
  • Thank you Dave2e, this is very helpful! Turned out it was only 6 files I had to manually edit. – Pharny Apr 03 '21 at 11:03

0 Answers0