1

I would like to convert a dataframe into a nested json object, and determine where to create a nested json object based on the column names.

I have made a toy example to explain the problem. Given this dataframe:

df <- read.csv(textConnection(
"id,name,allergies.pollen,allergies.pet,attributes.height,attributes.gender
x,alice,no,yes,175,female
y,bob,yes,yes,180,male"))

Or in a more readable format:

    id  name allergies.pollen allergies.pet attributes.height attributes.gender
  1  x alice               no           yes               175            female
  2  y   bob              yes           yes               180              male

Then I would like the following json object:

'[
  {
    "id": "x",
    "name": "alice",
    "allergies":
    {
      "pollen": "no",
      "pet": "yes"
    },
    "attributes": 
    {
      "height": "175",
      "gender": "female"
    }
  },
  {
    "id": "y",
    "name": "bob",
    "allergies":
    {
      "pollen": "yes",
      "pet": "yes"
    },
    "attributes":
    {
      "height": "180",
      "gender": "male"
    }
  }
]'

So it should automatically group the columns at a fixed separator ".".

Ideally it should be able to handle nested-nested objects too, e.g. allergies.pet.cat and allergies.pet.dog.

My best idea on solving this problem is making a function, which recursive call jsonlite::toJSON and extracts the categories using stringr::str_extract("^[^.]*"), but I have not been able to make this work.

Thomas
  • 13
  • 2
  • Please say what you expect to happen with collisions, such as having both `allergies.pet` and `allergies.pet.cat`. In general, it would help if your sample data had a second nesting, for completeness in answers. – r2evans Jul 08 '21 at 13:55
  • I wanted sub-categories to be nested too, i.e. `{"allergies" : {"pet": {"dog": "yes", "cat": "yes"}}}`. – Thomas Jul 09 '21 at 06:47
  • For collisions I hadn't given it any thought, so dealers choice. But I think the way your answer handles it is sensible. – Thomas Jul 09 '21 at 07:02

1 Answers1

0

Here's a function that seems to work. The only glitch is if there is a possible collision, e.g., allergies.pet and allergies.pet.car; while it does not error, it may be non-standard.

New data:

df <- read.csv(textConnection(
"id,name,allergies.pollen,allergies.pet,attributes.height,attributes.gender,allergies.pet.cat
x,alice,no,yes,175,female,quux
y,bob,yes,yes,180,male,unk"))

The function:

func <- function(x) {
  grps <- split(names(x), gsub("[.].*", "", names(x)))
  for (nm in names(grps)) {
    if (length(grps[[nm]]) > 1 || !nm %in% grps[[nm]]) {
      x[[nm]] <- setNames(subset(x, select = grps[[nm]]),
                          gsub("^[^.]+[.]", "", grps[[nm]]))
      x[,setdiff(grps[[nm]], nm)] <- NULL
    }
  }
  for (nm in names(x)) {
    if (is.data.frame(x[[nm]])) {
      x[[nm]] <- func(x[[nm]])
    }
  }
  if (any(grepl("[.]", names(x)))) func(x) else x
}

See how this nests all .-delimited columns into frames:

str(df)
# 'data.frame': 2 obs. of  7 variables:
#  $ id               : chr  "x" "y"
#  $ name             : chr  "alice" "bob"
#  $ allergies.pollen : chr  "no" "yes"
#  $ allergies.pet    : chr  "yes" "yes"
#  $ attributes.height: int  175 180
#  $ attributes.gender: chr  "female" "male"
#  $ allergies.pet.cat: chr  "quux" "unk"
newdf <- func(df)
str(newdf)
# 'data.frame': 2 obs. of  4 variables:
#  $ id        : chr  "x" "y"
#  $ name      : chr  "alice" "bob"
#  $ allergies :'data.frame':   2 obs. of  2 variables:
#   ..$ pollen: chr  "no" "yes"
#   ..$ pet   :'data.frame':    2 obs. of  2 variables:
#   .. ..$ pet: chr  "yes" "yes"
#   .. ..$ cat: chr  "quux" "unk"
#  $ attributes:'data.frame':   2 obs. of  2 variables:
#   ..$ height: int  175 180
#   ..$ gender: chr  "female" "male"

From here, it's straight-forward to jsonify:

jsonlite::toJSON(newdf, pretty = TRUE)
# [
#   {
#     "id": "x",
#     "name": "alice",
#     "allergies": {
#       "pollen": "no",
#       "pet": {
#         "pet": "yes",
#         "cat": "quux"
#       }
#     },
#     "attributes": {
#       "height": 175,
#       "gender": "female"
#     }
#   },
#   {
#     "id": "y",
#     "name": "bob",
#     "allergies": {
#       "pollen": "yes",
#       "pet": {
#         "pet": "yes",
#         "cat": "unk"
#       }
#     },
#     "attributes": {
#       "height": 180,
#       "gender": "male"
#     }
#   }
# ] 
r2evans
  • 141,215
  • 6
  • 77
  • 149
  • It works exactly as desired - thanks a lot! With regards to collisions, then my data set doesn't have any occasions where the tag should both be both nested and non-nested (e.g. both `{"pet": "yes", "pet": {"dog": "yes"}})`, but if it did, then I think handling it the way this does with `{"pet": {"pet": "yes", "dog": "yes"}}` makes sense. – Thomas Jul 09 '21 at 06:58