0

I am trying to parse/read the multiple xml files from my current data and try to combine them together.

current type of XMLResponse is character

And my sample xml file is like this:

<ApplicationResponse>
    <Service Name="AlternativeCreditAttributes">
      <Categories>
        <Category Name="Default">
          <Attributes>
            <Attribute Name="ACA_ACH_NSF_12M" Value="0" />
            <Attribute Name="ACA_ACH_NSF_18M" Value="0" />
            <Attribute Name="ACA_ACH_NSF_24M" Value="0" />
            <Attribute Name="ACA_ACH_NSF_3M" Value="0" />
            <Attribute Name="ACA_ACH_NSF_6M" Value="0" />
            <Attribute Name="ACA_ACH_NSF_9M" Value="0" />
            <Attribute Name="ACA_ACH_NSF_AMT_12M" Value="" />
            <Attribute Name="ACA_ACH_NSF_AMT_18M" Value="" />
            <Attribute Name="ACA_ACH_NSF_AMT_24M" Value="" />
            <Attribute Name="ACA_ACH_NSF_AMT_3M" Value="" />
            <Attribute Name="ACA_ACH_NSF_AMT_6M" Value="" />
            <Attribute Name="ACA_ACH_NSF_AMT_9M" Value="" />
            <Attribute Name="ACA_ACH_NSF_AMT_EVER" Value="600" />
            <Attribute Name="ACA_ACH_NSF_EVER" Value="2" />
            <Attribute Name="ACA_ACH_NSF_MONTHS_SINCE_NEWEST" Value="41" />
            <Attribute Name="ACA_ACH_NSF_MONTHS_SINCE_OLDEST" Value="41" />
          </Attributes>
        </Category>
      </Categories>
    </Service>
</ApplicationResponse>

I have successfully pulled one single file based on the following code:

doc<-read_xml(Data$XMLResponse[1])
  # setNames(data.frame(
    cols<- xml_attr(xml_find_all(doc, "//Attribute"), "Name")
    rows<- xml_attr(xml_find_all(doc, "//Attribute"), "Value")
  # ),
out  <- data.frame(rows, row.names = cols)
out

But when I tried to use lapply to pull multiple files base on this answer, I met the Error on working directory.

Error: 'NA' does not exist in current working directory

Below is the code I use. Please let me know if you know the issue or if you need any details on this problem. Thanks in advance.

df_list <- lapply(Data$XMLResponse, function(f) {
  doc <- read_xml(f)
  setNames(data.frame(
    xml_attr(xml_find_all(doc, "//Attribute"), "Name"),
    xml_attr(xml_find_all(doc, "//Attribute"), "Value")
  ),c("Name", f))
})
Parfait
  • 104,375
  • 17
  • 94
  • 125
Universe
  • 17
  • 1
  • 5
  • If you're running into problems with lapply you can step through the problem more easily by seeing what happens why you try to run the same code as a loop. Create an empty list first, store your doc variable at each index of the list, then if the loop throws an error, check the size of your new list. Often you'll find that one of the elements of the vector you were working on gives an unexpected result. – Allan Cameron Jan 16 '20 at 19:54
  • Thanks @AllanCameron. Creating a empty list helped me find where the loop stopped. It's probably because the illgeal xml character in the file. – Universe Jan 17 '20 at 16:23

1 Answers1

1

Here is an approach that uses a for() loop to collect all of the values from each xml file that you have stored within Data$XMLResponse. Code assumes every xml file has exactly the same length of "Attributes" in the same order.

library(xml2)    
#create a blank list
datalist = list()
#loop through your column of xml responses to extract the values you want.
for(i in 1:length(Data$XMLResponse)){
temp_vals<-read_xml(Data$XMLResponse[i])
temp_vals<-xml_attr(xml_find_all(temp_vals, "//Attribute"), "Value")
#assign these values to your data list
datalist[[i]]<-temp_vals
}

#bind the data from the xml files together
your_data = do.call(rbind, datalist)

Then get column names:

your_column_names<-xml_attr(xml_find_all(Data$XMLResponse[1], "//Attribute"), "Name")
doc<-setNames(data.frame(matrix(ncol = length(your_column_names), nrow = 0)), your_column_names)

And then use rbind() to bind your data with your column names

rbind(doc,your_data)
SEAnalyst
  • 1,077
  • 8
  • 15
  • Thanks, @SEAnalyst. I wonder how you define `temp_xmp`. I put the code `temp_xmp<-read_xml(Data$XMLResponse[i])` into your `for` loop, then the same working directory error comes again. – Universe Jan 16 '20 at 21:48
  • I've corrected the error (removing object by the name of `temp_xmp`) and added a line `temp_vals<-read_xml(Data$XMLResponse[i])` – SEAnalyst Jan 16 '20 at 22:21