-1

I'm currently working to extract data in JSON format into an R dataframe.

The data provided is of the following format:

enter image description here

Sample code to create test data:

test_input_data <- data.frame(date.x=c("2017-08-17", "2017-07-26", "2017-10-04"), properties.x=c("{\"gender\": \"Male\", \"nationality\": \"NZL\", \"document_type\": \"passport\", \"date_of_expiry\": \"2018-07-05\", \"issuing_country\": \"NZL\"}", "{\"gender\": \"Female\", \"nationality\": \"NLD\", \"document_type\": \"national_identity_card\", \"date_of_expiry\": \"2026-10-07\", \"issuing_country\": \"NLD\"}" , "{\"issuing_date\": \"2015-05-18\", \"document_type\": \"driving_licence\", \"date_of_expiry\": \"2017-05-05\", \"issuing_country\": \"IRL\"}"), stringsAsFactors = FALSE)

What I would like to do is to create a dataframe as follows:

enter image description here

I'm currently using the RJSONIO::fromJSON() function to map properties.x into a list and then unnesting it:

properties_doc_reports <- test_data %>% 
  mutate(properties.x = map(properties.x, ~ RJSONIO::fromJSON(.))) %>% 
  dplyr::filter(purrr::map_lgl(properties.x, ~!rlang::is_empty(.x))) %>% ##this is optional as it deletes all rows with empty lists
  as_tibble %>% 
  unnest(properties.x)

However, this gets rid of the 'key' in properties.x which is what I also need. For reference, the output of the R code gives me the following:

enter image description here

However, each row in the input data does not have a consistent set of key value pairs so its not possible for me to infer the key from the row number. For example, 'gender' is missing from row 3) in the input dataframe

Any ideas?

1 Answers1

1

Hi here is a quick solution. I am using the fact that the each json only contains one row. map_df from the package purrr then automatically converts all rows to a single data.frame. Since map_df keeps the row order is it just to bind the resulting df with the date column.

test_input_data <- data.frame(date.x=c("2017-08-17", "2017-07-26", "2017-10-04"), properties.x=c("{\"gender\": \"Male\", \"nationality\": \"NZL\", \"document_type\": \"passport\", \"date_of_expiry\": \"2018-07-05\", \"issuing_country\": \"NZL\"}", "{\"gender\": \"Female\", \"nationality\": \"NLD\", \"document_type\": \"national_identity_card\", \"date_of_expiry\": \"2026-10-07\", \"issuing_country\": \"NLD\"}" , "{\"issuing_date\": \"2015-05-18\", \"document_type\": \"driving_licence\", \"date_of_expiry\": \"2017-05-05\", \"issuing_country\": \"IRL\"}"), stringsAsFactors = FALSE)

library(tidyverse)
df <- bind_cols(
  test_input_data %>% 
    select(date.x),
  test_input_data$properties.x %>% 
    map_df(jsonlite::fromJSON)
)

Hope this helps!!

Bertil Baron
  • 4,923
  • 1
  • 15
  • 24
  • Thanks Bertil, it does! I used the melt function after to 'unpivot' the dataframe but this was the step i needed – Hamza Juzer Mar 25 '20 at 11:04