3

I am trying to use gganimate package of R, to create an animation of a bunch of histograms, where each frame of the animation shows a histogram of an image. I have ~400 images, so ~400 columns. My data looks like this:

| bins.left | bins.right | hist1 | hist 2 | ... | hist n |

and as you see, I need each column be considered as the Y value of the histogram, in each frame. In other words, my animation should iterate over the columns.

But all the examples that I have studied on the Internet, seem to be considering only one column as the identifier of the frames. For instance in this example:

mapping <- aes(x = gdpPercap, y = lifeExp, 
           size = pop, color = continent,
           frame = year) 
p <- ggplot(gapminder, mapping = mapping) +
  geom_point() +
  scale_x_log10()

the attribute 'Year' is considered as the iterator. This data looks like this:

  country continent  year lifeExp      pop gdpPercap
   <fctr>    <fctr> <int>   <dbl>    <int>     <dbl>
1 Afghanistan      Asia  1952  28.801  8425333  779.4453
2 Afghanistan      Asia  1957  30.332  9240934  820.8530
3 Afghanistan      Asia  1962  31.997 10267083  853.1007
4 Afghanistan      Asia  1967  34.020 11537966  836.1971
5 Afghanistan      Asia  1972  36.088 13079460  739.9811
6 Afghanistan      Asia  1977  38.438 14880372  786.1134

The reason that I don't want to modify my data to fit into such a pattern is that if I keep all the histograms in one column, my data frame will be extremely lengthy (length = ~ 16000 * 400) and difficult to handle. In addition, it is not intuitive to keep my data in such a confusing fashion. I believe there must be an easy solution to my problem. Any suggestion is highly appreciated.

eipi10
  • 91,525
  • 24
  • 209
  • 285
Azim
  • 1,596
  • 18
  • 34
  • 1
    You say you have ~400 images. Do you mean 400 data frames that you want to turn into histograms? Please post an actual sample of your data using `dput` (for example, paste into your question the output of `dput(my_data[1:10, ])`. – eipi10 Mar 13 '18 at 03:50
  • Most tools in R assume your data is *long* rather than *wide*, and therefore data is generally *easier* to work with in long format. Either way you have the same total number of cells, so I'm not sure I understand your concerns about reshaping the data. – Marius Mar 13 '18 at 04:08
  • @Marius, this form of data is not easy to view and avoid some potential mistakes. And it seems natural to keep time series each in a separate column. I don't want to reshape my data simply because I don't know how to make it work in gganimate. But if it is not possible, then I will have to reshape it. – Azim Mar 13 '18 at 04:58

1 Answers1

3

As @Marius said, you can make this work if your data is in long format. Below I create some fake data and then make the animated plot.

library(tidyverse)
theme_set(theme_classic())
library(gganimate)

Here's fake data with 10 columns of values we want to turn into histograms.

set.seed(2)
dat = replicate(10, runif(100)) %>% as.data.frame

The data is in wide format, so first we'll convert it to long format with the gather function:

d = dat %>% gather(key, value)

In the new long format, the key column tells us which histogram column the data originally came from. We'll use this as our frame and run geom_histogram:

p = ggplot(d, aes(value, frame=key)) +
  geom_histogram() 

gganimate(p)

You can see that this is not what we want. ggplot actually generated a single histogram from all the data and the animation just shows us in succession the part of each stack that came from each value of key.

enter image description here

We need a way to get ggplot to create separate histograms and animate them. We can do that by pre-binning the data and using geom_rect to create the histogram bars:

d = dat %>% gather(key, value) %>% 
  mutate(bins = cut(value, breaks=seq(0,1,0.1), 
                    labels=seq(0,0.9,0.1) + 0.05, include.lowest=TRUE),
         bins = as.numeric(as.character(bins))) %>% 
  group_by(key, bins) %>% 
  tally

p = ggplot(d, aes(xmin=bins - 0.048, xmax=bins + 0.048, ymin=0, ymax=n, frame=key)) +
  geom_rect() +
  scale_y_continuous(limits=c(0, max(d$n)))

gganimate(p)

enter image description here

In response to your comment, I don't think you can use gganimate with data in wide format. gganimate requires a single frame column, which requires data in long format. However, gganimate is a wrapper around the animation package, and you can create an animated GIF file directly with a for loop and the saveGIF function:

library(animation)

saveGIF({
  for (i in names(dat)) {
    p = ggplot(dat, aes_string(i)) +
      geom_histogram(breaks=seq(0,1,0.1))
    print(p)
  }
}, movie.name="test.gif")

enter image description here

eipi10
  • 91,525
  • 24
  • 209
  • 285
  • eipi10, thank you very much. This definitely resolves my issue. Before I mark this as the answer, allow me to wait for a while to see if someone could suggest a way that works without putting all the time series end to end in one column. Thanks again. – Azim Mar 13 '18 at 04:45
  • 2
    See update to my answer for a different method that allows you to keep your data in wide format. – eipi10 Mar 13 '18 at 04:57
  • Thanks for your thorough answer. – Azim Mar 13 '18 at 05:11