3

I'm trying to process some data in JSON format. rjson::fromJSON imports the data successfully and places it into a quite unwieldy list.

library(rjson)
y <- fromJSON(file="http://api.lmiforall.org.uk/api/v1/wf/predict/breakdown/region?soc=6145&minYear=2014&maxYear=2020")
str(y)
List of 3
 $ soc                : num 6145
 $ breakdown          : chr "region"
 $ predictedEmployment:List of 7
  ..$ :List of 2
  .. ..$ year     : num 2014
  .. ..$ breakdown:List of 12
  .. .. ..$ :List of 3
  .. .. .. ..$ code      : num 1
  .. .. .. ..$ name      : chr "London"
  .. .. .. ..$ employment: num 74910
  .. .. ..$ :List of 3
  .. .. .. ..$ code      : num 7
  .. .. .. ..$ name      : chr "Yorkshire and the Humber"
  .. .. .. ..$ employment: num 61132
  ...

However, as this is essentially tabular data, I would like it in a succinct data.frame. After much trial and error I have the result:

y.p <- do.call(rbind,lapply(y[[3]], function(p) cbind(p$year,do.call(rbind,lapply(p$breakdown, function(q) data.frame(q$name,q$employment,stringsAsFactors=F))))))
head(y.p)
  p$year                   q.name q.employment
1   2014                   London     74909.59
2   2014 Yorkshire and the Humber     61131.62
3   2014     South West (England)     65833.57
4   2014                    Wales     33002.64
5   2014  West Midlands (England)     68695.34
6   2014     South East (England)     98407.36

But the command seems overly fiddly and complex. Is there a simpler way of doing this?

James
  • 65,548
  • 14
  • 155
  • 193

2 Answers2

5

Here I recover the geometry of the list

ni <- seq_along(y[[3]])
nj <- seq_along(y[[c(3, 1, 2)]])
nij <- as.matrix(expand.grid(3, ni=ni, 2, nj=nj))

then extract the relevant variable information using the rows of nij as an index into the nested list

data <- apply(nij, 1, function(ij) y[[ij]])
year <- apply(cbind(nij[,1:2], 1), 1, function(ij) y[[ij]])

and make it into a more friendly structure

> data.frame(year, do.call(rbind, data))
   year code                     name employment
1  2014    1                   London   74909.59
2  2015    5  West Midlands (England)   69132.34
3  2016   12         Northern Ireland   24313.94
4  2017    5  West Midlands (England)    71723.4
5  2018    9     North East (England)   27199.99
6  2019    4     South West (England)   71219.51
Martin Morgan
  • 45,935
  • 7
  • 84
  • 112
  • Hai @agstudy I tried to implement this method to our case, but I am still failed, this is my case http://stackoverflow.com/questions/27227208/how-to-get-nested-value-rjson , I hope any someone can help me – user46543 Dec 01 '14 at 12:46
2

I am not sure it is simpler, but the result is more complete and I think is easier to read. My idea using Map is, for each couple (year,breakdown), aggregate breakdown data into single table and then combine it with year.

dat <- y[[3]]
res <- Map(function(x,y)data.frame(year=y,
                                   do.call(rbind,lapply(x,as.data.frame))),
        lapply(dat,'[[','breakdown'),
        lapply(dat,'[[','year'))
## transform the list to a big data.frame
do.call(rbind,res)
   year code                     name employment
1  2014    1                   London   74909.59
2  2014    7 Yorkshire and the Humber   61131.62
3  2014    4     South West (England)   65833.57
4  2014   10                    Wales   33002.64
5  2014    5  West Midlands (England)   68695.34
6  2014    2     South East (England)   98407.36
agstudy
  • 119,832
  • 17
  • 199
  • 261
  • I've accepted this answer, as it is easier to read what is going on. – James Jul 17 '13 at 13:16
  • When I run this, e.g., `lapply(dat, "[[", "breakdown")` with `dat = y` I get `Error in FUN(X[[1L]], ...) : subscript out of bounds` ?? – Martin Morgan Jul 17 '13 at 19:58
  • @MartinMorgan good catch.I edit my answer. I forget to mention that `dat <- y[[3]]` – agstudy Jul 17 '13 at 21:42
  • @agstudy Thanks for your excellent answer! May I ask, I have never seen lapply used in the way you have here. Can you explain what's happening? I am used to lapply being of the form: lapply(, ), but you seem to be pulling out some portion of the object. How does this work? The object, dat, does not even have 'breakdown' or 'year' at their highest level. I cannot seem to find anything within the documentation to explain this use case. Thanks so much!! – Mike Williamson Sep 27 '13 at 20:14
  • @MikeWilliamson you can see `?lapply` is `lapply(X, FUN, ...)` where `...` are `optional arguments to FUN`. So here my FUN is `[[` to which I give column name (breakdown or year) as optional argument. – agstudy Sep 27 '13 at 21:25
  • @agstudy Thanks for your reply! Yes, I understand that you can send a function to lapply. I just don't understand how `[[` is captured as a function. For instance, I couldn't say `[[(dat, 'breakdown')` and expect any response. I see what it's doing: it's grabbing the subgroup within dat called 'breakdown'. In effect, it's doing `dat$breakdown`, or `dat[["breakdown"]]`. But I've never seen this usage and still don't quite get how / why it works. – Mike Williamson Oct 07 '13 at 21:17