I have the following XML file (I am missing the root node but the editor is not allowing me--please assume there is a root node here):
<Indvls>
<Indvl>
<Info lastNm="HANSON" firstNm="LAURIE"/>
<CrntEmps>
<CrntEmp orgNm="ABC INCORPORATED" str1="FOURTY FOUR BRYANT PARK" city="NEW YORK" state="NY" cntry="UNITED STATES" postlCd="10036">
<BrnchOfLocs>
<BrnchOfLoc str1="833 NE 55TH ST" city="BELLEVUE" state="WA" cntry="UNITED STATES" postlCd="98004"/>
</BrnchOfLocs>
</CrntEmp>
</CrntEmps>
</Indvl>
<Indvl>
<Info lastNm="JACKSON" firstNm="SHERRY"/>
<CrntEmps>
<CrntEmp orgNm="XYZ INCORPORATED" str1="3411 GEORGE STREET" city="SAN FRANCISCO" state="CA" cntry="UNITED STATES" postlCd="94105">
<BrnchOfLocs>
</BrnchOfLocs>
</CrntEmp>
</CrntEmps>
</Indvl>
</Indvls>
Using R, I want to extract the following columns in the form of a table: (a) lastNm and firstNm from /Info node--always present with values; (b) orgNm from /CrntEmps/CrntEmp node--always present with values; and (c) str1, city, state from /CrntEmps/BrnchOfLocs/BrnchofLoc node--may or may not come with values (in my example the second entity does NOT have an office location address).
My challenge is that many nodes will not have the BrnchOfLoc node. I want to create an entry even if the nodes are missing (otherwise the table is unbalanced and I get an error while creating it in a data frame).
Any thoughts or suggestions? I appreciate any inputs.
Addendum: Here is my code:
xmlGetNodeAttr <- function(n, xp, attr, default=NA) {
ns<-getNodeSet(n, xp)
if(length(ns)<1) {
return(default)
} else {
sapply(ns, xmlGetAttr, attr, default)
}
}
do.call(rbind, lapply(xmlChildren(xmlRoot(doc)), function(x) {
data.frame(
fname=xmlGetNodeAttr(x, "//Info","firstNm",NA),
lname=xmlGetNodeAttr(x, "//Info","lastNm",NA),
orgname=xmlGetNodeAttr(x,"//CrntEmps/CrntEmp[1]","orgNm",NA),
zip=xmlGetNodeAttr(x, "//CrntEmps/CrntEmp[1]/BrnchOfLocs/BrnchOfLoc[1]","city",NA)
)
}))