0

I’m trying to get data from the web into a data frame. It comes as XML, but I cannot convert it as usual. It seem to be in an "S3:list" (?) which I don’t know how to convert into something else.

data_xml = read_xml(url)

Display of Variable in RStudio

View(data_xml)

Variable Content

Unfortunately, all unlist and parse commands don't lead to useful output. Does anyone know how to handle this?

Thanks in advance Dave

tryhard
  • 1
  • 2

1 Answers1

0

XML typically does not translate into a rectangular data frame, think of a tree structure with (possibly) deeply nested branches. Hard to recommend any particular strategy without having access to that particular file, but with a simple structure (ex. cd_catalog.xml) something like this might work:

library(xml2)
library(dplyr)
library(tidyr)

data_xml <- read_xml("https://www.w3schools.com/xml/cd_catalog.xml")

# find all `CD` elements,
# convert to list,
# convert to tibble with bind_rows(),
# unnest all columns
xml_find_all(data_xml, "//CD") |> 
  as_list() |> 
  bind_rows() |>
  unnest(everything())
#> # A tibble: 26 × 6
#>    TITLE                    ARTIST          COUNTRY COMPANY        PRICE YEAR 
#>    <chr>                    <chr>           <chr>   <chr>          <chr> <chr>
#>  1 Empire Burlesque         Bob Dylan       USA     Columbia       10.90 1985 
#>  2 Hide your heart          Bonnie Tyler    UK      CBS Records    9.90  1988 
#>  3 Greatest Hits            Dolly Parton    USA     RCA            9.90  1982 
#>  4 Still got the blues      Gary Moore      UK      Virgin records 10.20 1990 
#>  5 Eros                     Eros Ramazzotti EU      BMG            9.90  1997 
#>  6 One night only           Bee Gees        UK      Polydor        10.90 1998 
#>  7 Sylvias Mother           Dr.Hook         UK      CBS            8.10  1973 
#>  8 Maggie May               Rod Stewart     UK      Pickwick       8.50  1990 
#>  9 Romanza                  Andrea Bocelli  EU      Polydor        10.80 1996 
#> 10 When a man loves a woman Percy Sledge    USA     Atlantic       8.70  1987 
#> # ℹ 16 more rows

Created on 2023-09-01 with reprex v2.0.2

margusl
  • 7,804
  • 2
  • 16
  • 20