0

I have a complicated xml file with items as 1st child nodes. The items can have different structure and some of the attributes are missing in some of them. I need to store one item (nodeset) in tibble row, so that I keep track on missing attributes and write a function handling all variants.

I found a solution of the first step by Felix Ebert: https://stackoverflow.com/questions/49253021/how-to-extract-xml-attr-and-xml-text-on-different-levels-with-xml2-and-purrr

I copy part of the code here:

xml <- xml2::read_xml("input/example.xml")
rows <- xml %>% xml_find_all("//xmlsubsubnode")
rows_df <- data_frame(node = rows)

Function data_frame was depreciated and I got error messages if I replace it with

tibble()
as_tibble()
data.frame()

With "tibble" I get following ERROR:

df_articles <- tibble(item = xml_articles)
Error:
! All columns in a tibble must be vectors.
✖ Column `item` is a `xml_nodeset` object.
Backtrace:
1. tibble::tibble(item = xml_articles)
2. tibble:::tibble_quos(xs, .rows, .name_repair)
3. tibble:::check_valid_col(res, col_names[[j]], j)
4. tibble:::check_valid_cols(set_names(list(x), name))

I would be grateful if anybody can update the original post.

M--
  • 25,431
  • 8
  • 61
  • 93
ReCodeRa
  • 3
  • 3
  • If you can post a sample `xml` file it will be a lot easier to help you. It appears you need to convert `rows` to a vector. Maybe use `xml_attr`? `rows_df <- rows %>% map_df(xml_attr)` – pgcudahy Nov 02 '22 at 09:26
  • Dear @pgcudahy , the structure of xml would be as below. There is a root node "items" containing 2 different types of those items (i.e. p_item or e_items). I want to split the large xml to dataframe, where each item will occupy single row. Later I want to apply a function that will extract info for each item to corresponding column. ' id1 id2 id3 id3 ' – ReCodeRa Nov 03 '22 at 14:49

0 Answers0