73

I have a problem to plot a subset of a data frame with ggplot2. My df is like:

df = data.frame(ID = c('P1', 'P1', 'P2', 'P2', 'P3', 'P3'),
                Value1 = c(100, 120, 300, 400, 130, 140),
                Value2 = c(12, 13, 11, 16, 15, 12))

How can I now plot Value1 vs Value2 only for IDs 'P1' and 'P3'? For example I tried:

ggplot(subset(df,ID=="P1 & P3") +
  geom_line(aes(Value1, Value2, group=ID, colour=ID)))

but I always receive an error.

Heikki
  • 2,214
  • 19
  • 34
matteo
  • 4,683
  • 9
  • 41
  • 77

10 Answers10

81

Here 2 options for subsetting:

Using subset from base R:

library(ggplot2)
ggplot(subset(dat,ID %in% c("P1" , "P3"))) + 
         geom_line(aes(Value1, Value2, group=ID, colour=ID))

Using subset the argument of geom_line(Note I am using plyr package to use the special . function).

library(plyr)
ggplot(data=dat)+ 
  geom_line(aes(Value1, Value2, group=ID, colour=ID),
                ,subset = .(ID %in% c("P1" , "P3")))

You can also use the complementary subsetting:

subset(dat,ID != "P2")
agstudy
  • 119,832
  • 17
  • 199
  • 261
  • 5
    It may be worth adding that following the depreciation of the `subset` argument the comparable results could be obtained using `geom_line(data=dat[dat$ID %in% c("P1" , "P3"),], ...)` as discussed [here](http://stackoverflow.com/a/34588455/1655567). In effect this works on the same basis as [the answer](http://stackoverflow.com/a/18165706/1655567) below. The minor difference is using subsetted data inside the geom call. – Konrad Aug 30 '16 at 13:21
  • @agstudy @konrad-rudolph defining `data=function(x) {...}` may work in place of `subset`. – Dave Oct 31 '16 at 17:38
  • @agstudy My data frame contains 3 columns (Year, rain, temp). So, when I am trying to plot only for the selected year I used `ggplot(subset(data=aggdata, Year %in% c("1901" , "1910")), aes(x=Year, y=tem, color=factor(Year)))` and it showing error `Error in Year %in% c("1901", "1910") : object 'Year' not found`. Could you tell me what I have to do? – mostafiz67 Sep 29 '20 at 08:56
  • is it possible to subset more than one column within ggplot? say I wanted to subset Year == 2022 & Age == 12 or something like that...with two columns to subset – FishyFishies Jul 31 '22 at 17:18
29

There's another solution that I find useful, especially when I want to plot multiple subsets of the same object:

myplot<-ggplot(df)+geom_line(aes(Value1, Value2, group=ID, colour=ID))
myplot %+% subset(df, ID %in% c("P1","P3"))
myplot %+% subset(df, ID %in% c("P2"))
Nick Isaac
  • 311
  • 3
  • 2
  • @Nick Yes, your code is working fine (creating plot) but in my case not showing the line! Could you tell me what I have to do? [https://i.postimg.cc/85VgpMKz/Screenshot-from-2020-09-29-15-21-54.png] – mostafiz67 Sep 29 '20 at 09:23
  • You've specified Year as both grouping variable and colour. Lines are drawn between data points of the same group. Setting up the plot in this way means you have only one observation per group. So the solution is to remove "group=Year" – Nick Isaac Sep 30 '20 at 11:15
  • Anyway to ensure that this keep colours the same? i.e. there are four lines, red green blue purple, if you subset to keep item 1 and 4 to keep the colours as red and purple, rather than red and green. – Aaron Walton Jun 02 '21 at 13:13
15

@agstudy's answer didn't work for me with the latest version of ggplot2, but this did, using maggritr pipes:

ggplot(data=dat)+ 
  geom_line(aes(Value1, Value2, group=ID, colour=ID),
                data = . %>% filter(ID %in% c("P1" , "P3")))

It works because if geom_line sees that data is a function, it will call that function with the inherited version of data and use the output of that function as data.

nicolaskruchten
  • 26,384
  • 8
  • 83
  • 101
  • Does this still work? Not sure, whether they changed the `.` to `.x`. Haven't found anything in the NEWS though. See also my answer. BTW, they recently changed a lot. – andschar Nov 06 '20 at 11:07
  • 1
    @andschar Definitely still works. Both `.` and `.x` are fine. – Maurits Evers Jul 04 '22 at 06:29
14

With option 2 in @agstudy's answer now deprecated, defining data with a function can be handy.

library(plyr)
ggplot(data=dat) + 
  geom_line(aes(Value1, Value2, group=ID, colour=ID),
            data=function(x){x$ID %in% c("P1", "P3"))

This approach comes in handy if you wish to reuse a dataset in the same plot, e.g. you don't want to specify a new column in the data.frame, or you want to explicitly plot one dataset in a layer above the other.:

library(plyr)
ggplot(data=dat, aes(Value1, Value2, group=ID, colour=ID)) + 
  geom_line(data=function(x){x[!x$ID %in% c("P1", "P3"), ]}, alpha=0.5) +
  geom_line(data=function(x){x[x$ID %in% c("P1", "P3"), ]})
Dave
  • 2,396
  • 2
  • 22
  • 25
8

Are you looking for the following plot:

library(ggplot2) 
l<-df[df$ID %in% c("P1","P3"),]
myplot<-ggplot(l)+geom_line(aes(Value1, Value2, group=ID, colour=ID))

enter image description here

Metrics
  • 15,172
  • 7
  • 54
  • 83
4

Your formulation is almost correct. You want:

subset(dat, ID=="P1" | ID=="P3") 

Where the | ('pipe') means 'or'. Your solution, ID=="P1 & P3", is looking for a case where ID is literally "P1 & P3"

Drew Steen
  • 16,045
  • 12
  • 62
  • 90
2

Try filter to subset only the rows of P1 and P3

df2 <- filter(df, ID == "P1" | ID == "P3")

Than yo can plot Value1. vs Value2.

CSV
  • 759
  • 5
  • 4
2

You can use ~subset(., ...) - this is a way to do what Dave above suggests, which also

  • works with current {ggplot2} (3.4.2)
  • does not require the {magrittr} pipe - for those who switched to R pipe
  • references the data as it was input to the data param of the ggplot() function, e.g. when the data was piped in
  • is a bit more concise/easier to understand then defining a function
ggplot(mtcars, aes(hp, disp)) +
  geom_point() +
  geom_point(data = ~subset(., cyl == 4), color = "red")

e.g. also works like so when the data was piped in:

mtcars |> 
  filter(gear > 3) |> 
  ggplot(aes(hp, disp)) +
  geom_point() +
  geom_point(data = ~subset(., cyl == 4), color = "red")

Petr
  • 21
  • 1
0

Use subset within ggplot

ggplot(data = subset(df, ID == "P1" | ID == "P2") +
   aes(Value1, Value2, group=ID, colour=ID) +
   geom_line()
shiny
  • 3,380
  • 9
  • 42
  • 79
hizjamali
  • 35
  • 5
0

Similar to @nicolaskruchten s answer you could do the following:

require(ggplot2)

df = data.frame(ID = c('P1', 'P1', 'P2', 'P2', 'P3', 'P3'),
                Value1 = c(100, 120, 300, 400, 130, 140),
                Value2 = c(12, 13, 11, 16, 15, 12))

ggplot(df) + 
  geom_line(data = ~.x[.x$ID %in% c("P1" , "P3"), ],
            aes(Value1, Value2, group = ID, colour = ID))
andschar
  • 3,504
  • 2
  • 27
  • 35