I'm currently working to extract data in JSON format into an R dataframe.
The data provided is of the following format:
Sample code to create test data:
test_input_data <- data.frame(date.x=c("2017-08-17", "2017-07-26", "2017-10-04"), properties.x=c("{\"gender\": \"Male\", \"nationality\": \"NZL\", \"document_type\": \"passport\", \"date_of_expiry\": \"2018-07-05\", \"issuing_country\": \"NZL\"}", "{\"gender\": \"Female\", \"nationality\": \"NLD\", \"document_type\": \"national_identity_card\", \"date_of_expiry\": \"2026-10-07\", \"issuing_country\": \"NLD\"}" , "{\"issuing_date\": \"2015-05-18\", \"document_type\": \"driving_licence\", \"date_of_expiry\": \"2017-05-05\", \"issuing_country\": \"IRL\"}"), stringsAsFactors = FALSE)
What I would like to do is to create a dataframe as follows:
I'm currently using the RJSONIO::fromJSON() function to map properties.x into a list and then unnesting it:
properties_doc_reports <- test_data %>%
mutate(properties.x = map(properties.x, ~ RJSONIO::fromJSON(.))) %>%
dplyr::filter(purrr::map_lgl(properties.x, ~!rlang::is_empty(.x))) %>% ##this is optional as it deletes all rows with empty lists
as_tibble %>%
unnest(properties.x)
However, this gets rid of the 'key' in properties.x which is what I also need. For reference, the output of the R code gives me the following:
However, each row in the input data does not have a consistent set of key value pairs so its not possible for me to infer the key from the row number. For example, 'gender' is missing from row 3) in the input dataframe
Any ideas?