0

I have just found the function facet_grid in ggplot2, it's awesome. The question is: I have a list with 6 countries (column HC) and destination of flights all around the world. My data look like this:

           HC Reason Destination  freq       Perc
        <chr>  <chr>       <chr> <int>      <dbl>
 1    Germany  Study     Germany     9  0.3651116
 2    Germany   Work     Germany     3  0.1488095
 3    Germany Others     Germany     3  0.4901961
 4    Hungary  Study     Germany   105 21.4285714
 5    Hungary   Work     Germany   118 17.6382661
 6    Hungary Others     Germany    24  5.0955414
 7 Luxembourg  Study     Germany   362 31.5056571

Is there a way that in each country only show the top ten destinations and using the function facet_grid? Im trying to make a scatter plot in this way:

Geograp %>% 
  gather(key=Destination, value=freq, -Reason, -Qcountry) %>%
  rename(HC = Qcountry) %>%
  group_by(HC,Reason) %>%
  mutate(Perc=freq*100/sum(freq)) %>%
  ggplot(aes(x=Perc, y=reorder(Destination,Perc))) +
  geom_point(size=3) +
  theme_bw() +
  facet_grid(HC~Reason) +
  theme(panel.grid.major.x = element_blank(),
        panel.grid.minor.x = element_blank(),
        panel.grid.major.y = element_line(colour = "grey60", linetype = "dashed"))

Which produces this graph: enter image description here I want to avoid the overplotting in the y-axis. Thanks in advance!!!

Tito Sanz
  • 1,280
  • 1
  • 16
  • 33

2 Answers2

2

You could create a variable indicating the rank of each destination by country and then in the ggplot call select rows with ranking <= 10, e.g.

ggplot(data = mydata[rank <= 10, ], ....)

PS: Currently you create data and plot data all in one line using pipes. I would separate the data creation and plotting step.

Richard
  • 1,224
  • 3
  • 16
  • 32
0

As You have not posted Your data in correct format (check out dput()), i have used just a sample data. Using dplyr package i grouped in this case by grp variable (group_by(grp), in Your case it is a country) and selected top 10 rows (...top_n(n = 10,...) which are sorted by x variable (wt = x, in Your case it will be freq) and plotted it further (just in this case scatter plot):

library(dplyr)
set.seed(123)
d <- data.frame(x   = runif(90),grp = gl(3, 30))

d %>%
group_by(grp) %>%
top_n(n = 10, wt = x) %>%
ggplot(aes(x=x, y=grp)) + geom_point()
Mal_a
  • 3,670
  • 1
  • 27
  • 60
  • Thanks for your answer! My problem is that in each country (column HC) has is own "Top 10 destinations". Is there a simple way to integrate the function top_n with faced_grid? – Tito Sanz Aug 01 '17 at 07:52
  • i do not really understand what You mean, what does it mean that column HC has is own Top 10 destinations?Where it is shown in Your table?What kind of 10 Top destinations You wanna show? – Mal_a Aug 01 '17 at 07:58
  • I mean, HC is the country of origin of the trip and Destination is the destination country. So each "country of origin" has is own "Top 10 destinations". However your answer gives me a good start point!! Thank you! – Tito Sanz Aug 01 '17 at 09:19
  • Exactly thats why in the code i have posted you have a `group_by` which will group Your data according to country and then choose top 10 destinations for each of them separately – Mal_a Aug 01 '17 at 09:23