How to reformat a table in R
from:
Fruit | Location |
---|---|
Apple | US, UK |
Pear | PL, DE, FR, AR |
Banana | CO |
to:
Fruit | Location |
---|---|
Apple | US |
Apple | UK |
Pear | PL |
Pear | DE |
Pear | FR |
Pear | AR |
Banana | CO |
... assuming it comes in a file TSV file
How to reformat a table in R
from:
Fruit | Location |
---|---|
Apple | US, UK |
Pear | PL, DE, FR, AR |
Banana | CO |
to:
Fruit | Location |
---|---|
Apple | US |
Apple | UK |
Pear | PL |
Pear | DE |
Pear | FR |
Pear | AR |
Banana | CO |
... assuming it comes in a file TSV file
The key is to use tidyr::separate_rows(table, colname, separator)
.
A wrapper:
STRINGdb.reformat.ann.table.per.gene <- function(path_of_tsv = '/Users/.../enrichment.DISEASES.tsv'
, column = 'matching proteins in your network (labels)'
, sep = ',') {
annotation_tsv <- CodeAndRoll2::read.simple.tsv(path_of_tsv)
stopifnot(column %in% colnames(annotation_tsv))
(tbl_split <- tidyr::separate_rows(data = annotation_tsv, column, sep = sep ))
CodeAndRoll2::write.simple.tsv(tbl_split, ManualName = ppp((path_of_tsv), "per.gene.tsv"))
return(tbl_split)
}
You may have a couple of ways to make it:
stack
with(df, rev(stack(setNames(strsplit(Location, ", "), Fruit))))
data.table
optionsetDT(df)[, .(Location = unlist(strsplit(Location, ", "))), Fruit]
tidyr
option using unnest
df %>%
mutate(Location = strsplit(Location, ", ")) %>%
unnest(Location)