0

In general I feel there is a need to make JSON objects by folding multiple columns. There is no direct way to do this afaik. Please point it out if there is ..

I have data of this from

A B C
1 a x
1 a y
1 c z
2 d p
2 f q
2 f r

How do I write a json which looks like

{'query':'1', 'type':[{'name':'a', 'values':[{'value':'x'}, {'value':'y'}]}, {'name':'c', 'values':[{'value':'z'}]}]}

and similarly for 'query':'2'

I am looking to spit them in the mongo import/export individual json lines format. Any pointers are also appreciated..

ajkl
  • 1,016
  • 1
  • 7
  • 27

1 Answers1

1

You've got a little "non-standard" thing going with two keys of "value" (I don't know if this is legal json), as you can see here:

(js <- jsonlite::fromJSON('{"query":"1", "type":[{"name":"a", "values":[{"value":"x"}, {"value":"y"}]}, {"name":"c", "values":[{"value":"z"}]}]}'))
## $query
## [1] "1"
## 
## $type
##   name values
## 1    a   x, y
## 2    c      z

... with a data.frame cell containing a list of data.frames:

js$type$values[[1]]
##   value
## 1     x
## 2     y
class(js$type$values[[1]])
## [1] "data.frame"

If you can accept your "type" variable containing a vector instead of a named-list, then perhaps the following code will suffice:

jsonlite::toJSON(lapply(unique(dat[, 'A']), function(a1) {
    list(query = a1, 
         type = lapply(unique(dat[dat$A == a1, 'B']),  function(b2) {
             list(name = b2,
                  values = dat[(dat$A == a1) & (dat$B == b2), 'C'])
         }))
}))
## [{"query":[1],"type":[{"name":["a"],"values":["x","y"]},{"name":["c"],"values":["z"]}]},{"query":[2],"type":[{"name":["d"],"values":["p"]},{"name":["f"],"values":["q","r"]}]}] 
r2evans
  • 141,215
  • 6
  • 77
  • 149
  • did you mean the 'values' variable being a vector instead of a named-list? And yes I can live with that. Thanks for the answer – ajkl Nov 18 '14 at 08:30
  • one solution that kinda worked for me was to roll-up the C column into a vector and make the A,B pairs unique. Is there a solution which does this incrementally for multiple columns till you get one primary key column ? – ajkl Nov 18 '14 at 16:49
  • I'm not certain what you mean. If you combine A and B with something like `data.frame(AB=paste0(dat$A, dat$B), C=dat$C)`, it will simplify the code a little (just one `lapply`) but will not be broken out by query and type. (If you need to combine all but the last column, you could do something like: `data.frame(KEY=apply(dat[,-ncol(dat)], 1, paste0, collapse=''), VAL=dat[,ncol(dat)])`.) – r2evans Nov 18 '14 at 18:02