1

I have list of data.frame that needed to be categorized into different set. I found some post about how to manipulate data.frame list. However, I tried given soluion in SO and couldn't generate stack bar plot by using ggplot2 . I've read about ggplot2 package's vignette, learned how to use basic features. The points, when I tried to split each data.frame in the list by its pos.score column, result gonna be nested list. working with nested list in R is not desired. Is there any easier and efficient way to categorize data.frame in the list more elegantly ? How to create stack bar plot for file bar (I mean, for each data.frame object that I specified) after splitting data.frame ? How can I make plot data available for ggplot function ? Is there any efficient way to do this ? This is my first post, so if made mistake on my question, please remind me. Thanks a lot.

simulation data :

dfList <- list(
  hotan = data.frame( begin=seq(1, by=6, len=25), end=seq(4, by=6, len=25), pos.score=sample(30, 25)),
  aksu = data.frame( begin=seq(3, by=9, len=30), end=seq(6, by=9, len=30), pos.score=sample(45, 30)),
  korla = data.frame( begin=seq(6, by=8, len=45), end=seq(11, by=8, len=45), pos.score=sample(52, 45))
)

categorize data.frame

catg <- lapply(myList, function(elm) {
  res <- split(elm, ifelse(elm$pos.score >=16, "valid", "invalid"))
})

doing this way, I got nested list, can't be desired for generating bar plot. I am seeking more elegant solution like using tidyr package. I am quite new with using these packages. How can I make it happen ? Any idea please?

This is nasty way to get rid of nested list, Is there any beautiful solution on that ?

unlist(lapply(catg, unlist))

Edit

I intend to get list of data.frame like this :

$hotan.valid
$hotan.invalid
$aksu.valid
$aksu.invalid
$korla.valid
$korla.invalid

then generate stack bar plot for file bar (each data.frame). How can I make this happen easily ? This is mockups of desired bar plot:

desired stack bar plot

I am stuck how to generate stack bar plot after I remove nested list. How can I achieve my desired stack bar plot for file bar ? How can I make it easier for categorizing each data.frame in the list ?

Jerry07
  • 929
  • 1
  • 10
  • 28

1 Answers1

1

Here is one way using dplyr (and ggplot2):

EDIT: Here is a way to process dfList using the plyr package:

dfList <- list(
    hotan = data.frame( begin=seq(1, by=6, len=25), end=seq(4, by=6, len=25), pos.score=sample(30, 25)),
    aksu = data.frame( begin=seq(3, by=9, len=30), end=seq(6, by=9, len=30), pos.score=sample(45, 30)),
    korla = data.frame( begin=seq(6, by=8, len=45), end=seq(11, by=8, len=45), pos.score=sample(52, 45))
)

df <- ldply(dfList)

library(dplyr)
library(ggplot2)
library(plyr)

df_plot <-
    df %>% 
    mutate(valid = factor(ifelse(pos.score >= 16, 1, 0))) %>%  # if pos.score is greater than or equal to 16, valid = 1, else, valid = 0
    count(.id, valid) 

ggplot(df_plot, aes(x = .id, y = n, fill = valid)) +
    geom_col(position = "dodge")

ggsave("group_valid.png", width = 4, height = 4)

The key is to put all the data into one data.frame, then count the frequency for observations with the variables group and valid.

ggplot-output

Joshua Rosenberg
  • 4,014
  • 9
  • 34
  • 73
  • keeping dfList as it is, so any dynamic solution on that ? What if dfList has more data.frame that need to be categorized, so take each data.frame outside the list can't be desired. Any chance to make your solution more programmatic ? Thanks a lot :) – Jerry07 Dec 15 '16 at 17:15
  • just added one solution, uses the `plyr` package which may not be desirable, but it should work fine – Joshua Rosenberg Dec 15 '16 at 19:23
  • why you said using plyr package is not desirable ? Plus, how can I change name of x, y lable with something else ? How can I make it? Thank you – Jerry07 Dec 15 '16 at 19:38
  • Just because it's another package - there may be a solution using base R (or even dplyr). But it works fine – Joshua Rosenberg Dec 15 '16 at 19:39
  • http://stackoverflow.com/questions/10438752/adding-x-and-y-axis-labels-in-ggplot2 – Joshua Rosenberg Dec 15 '16 at 19:40
  • Thanks. I intend to change x, y label as "sample", "output". In your solution, I can't make this changes. How can I do that ? sorry for this simple question. – Jerry07 Dec 15 '16 at 19:41
  • Check out that link above and let me know if you have any trouble – Joshua Rosenberg Dec 15 '16 at 19:43
  • Thank you, this link helped. Plus, I intend to indicates number of observation by using : geom_text(aes(label=n), position = position_stack(vjust = .5), hjust = 0.5), but layout is bit of mass. How can I fix this? – Jerry07 Dec 15 '16 at 19:59
  • try `geom_text(aes(label=n), position = position_dodge(.9))` – Joshua Rosenberg Dec 15 '16 at 21:18