0

I am taking a course that includes learning how to read XML files into R. I'm trying to read this XML document into an R object:

https://d396qusza40orc.cloudfront.net/getdata%2Fdata%2Frestaurants.xml

When I follow instructions I'm given, it creates an R object, but when I try to index it, it shows the whole document instead of a list of the names of the levels below the root node. When I try to index it further, R Studio (Posit) cloud crashes. I am extremely new to R and XML, so I have no clue what's wrong.

This is what the XML file looks like:

enter image description here

I did the following:

library(XML)
fileUrl<-"http://d396qusza40orc.cloudfront.net/getdata%2Fdata%2Frestaurants.xml"
doc<-xmlTreeParse(fileUrl, useInternal = TRUE)
rootNode<-xmlRoot(doc)
xmlName(rootNode)

(Notice I took out the "s" from https. I don't know why, but it doesn't work when you have the s in.)

It returned "response" as expected.

Then I did:

names(rootNode)

And all that returned was

row
"row"

So I tried to index it to the 1st level, hoping it would give me the first chunk (ie row_ID = "1"):

rootNode[[1]]

and it gave me literally the entire document.

But (DO NOT TRY THIS), then I tried:

rootNode[[1]][[1]]

I was hoping it was going to give me the first, individual section, ie:

<row _id="1" _uuid="93CACF6F-C8C2-4B87-95A8-8177806D5A6F" _position="1" _address="http://data.baltimorecity.gov/resource/k5ry-ef3g/1">
<name>410</name>
<zipcode>21206</zipcode>
<neighborhood>Frankford</neighborhood>
<councildistrict>2</councildistrict>
<policedistrict>NORTHEASTERN</policedistrict>
<location_1 human_address="{"address":"4509 BELAIR ROAD","city":"Baltimore","state":"MD","zip":""}" needs_recoding="true"/>
</row>

But it crashed the project on R Studio cloud, and it still won't open to this day.

Before I delete the project, I need to understand what I did wrong, and how I can index the tree parsed document to give me individual sections like the one I pasted above.

kswp
  • 35
  • 3

0 Answers0