Creating new rows for listed substrings

Question

My goal is to create a wordcloud in R, but I'm working with nested JSON data (which also happens to be incredibly messy).

There's a nice explanation here for how to create a wordcloud of phrases rather than singular words. I also know melt() from reshape2 can create new rows out of entire columns. Is there a way in R to perform a melt-like function over nested substrings?

Example:

N        Group     String
1        A         c("a", "b", c")
2        A         character(0)
3        B         a
4        B         c("b", d")
5        B         d

...should become:

N        Group     String
1        A         a
2        A         b
3        A         c
4        A         character(0)
5        B         a
6        B         b
7        B         d
8        B         d

...where each subsequent substring is returned to the next row. In my actual data, the pattern c("x, y") is consistent but the substrings are too varied to know a priori.

If there's no great way to do this, too bad... just thought I'd ask the experts!

Would it be possible to load your json data into a `list` rather than a `data.frame`? If so, then you can use `unlist` to separate all the elements. — Gaurav Bansal, Jan 11 '17 at 21:17

score 1 · Accepted Answer · answered Jan 11 '17 at 21:26

1

You can use separate_rows from the tidyr package:

library(tidyverse)

data %>% 
  separate_rows(listcites, sep = ",") %>% # split on commas
  dmap_at("listcites", ~ gsub("^c\\(\"|\")$|\"", "", .x)) # clean up the quotations and parens

answered Jan 11 '17 at 21:26

Mark Timms

596
3
4

It was brought to my attention that this question is a duplicate. However, none of the answers on the linked page solved my problem. Yours did. +1 – beddotcom Jan 11 '17 at 22:24

Creating new rows for listed substrings

1 Answers1