0

I have the following data frame.

Value Type Year
10     car 1995
23     car 1995
2      car 1997
10     car 2000
11     bus 1997
23     bus 1995
2      bus 1997
10     bus 2000
12     car 1997
13     bus 1995
14     jeep 2000
15     jeep 1995
23     jeep 1995
2      jeep 1997
10     jeep 2000
8      car  2000
9      bus  2000
1      jeep 1997

I want to sum the rows first based on the column type and then year. I want the following output.

Value Type Year
   33    car  1995
   14    car  1997
   18    car  2000  
   36    bus  1995
   13    bus  1997
   19    bus  2000
   38    jeep 1995
    3    jeep 1997
   24    jeep 2000

Can anyone tell me how to get this?

Ranger
  • 151
  • 1
  • 11

2 Answers2

5

We can use one of the aggregating function by group. If we are interested only in base R, aggregate is a useful, compact function.

aggregate(Value ~ Year + Type, df1, FUN=sum)
# Year Type Value
#1 1995  car    33
#2 1997  car    14
#3 2000  car    18
#4 1995  bus    36
#5 1997  bus    13
#6 2000  bus    19
#7 1995 jeep    38
#8 1997 jeep     3
#9 2000 jeep    24

Or we can try dplyr

library(dplyr)
df1 %>%
   group_by(Type, Year) %>%
   summarise(Value=sum(Value))
#   Type Year Value
#1  car 1995    33
#2  car 1997    14
#3  car 2000    18
#4  bus 1995    36
#5  bus 1997    13
#6  bus 2000    19
#7 jeep 1995    38
#8 jeep 1997     3
#9 jeep 2000    24

Or another compact and fast option is data.table

library(data.table)#v1.9.5+
setDT(df1)[, list(Value=sum(Value)), .(Type, Year)]
#Type Year Value
#1:  car 1995    33
#2:  car 1997    14
#3:  car 2000    18
#4:  bus 1997    13
#5:  bus 1995    36
#6:  bus 2000    19
#7: jeep 2000    24
#8: jeep 1995    38
#9: jeep 1997     3

Or a solution based on sqldf

library(sqldf)
sqldf('select Type, Year,
        sum(Value) as Value 
        from df1 
        group by Type, Year')

Update

If we want to plot ,

 library(ggplot2)
 df1 %>%
   group_by(Type, Year) %>%
   summarise(Value=sum(Value))  %>%
   ggplot(., aes(x=Year, y=Value))+
          geom_line() + 
          facet_wrap(~Type)

data

  df1 <- structure(list(Value = c(10L, 23L, 2L, 10L, 11L, 23L, 2L, 10L, 
  12L, 13L, 14L, 15L, 23L, 2L, 10L, 8L, 9L, 1L), Type = c("car", 
  "car", "car", "car", "bus", "bus", "bus", "bus", "car", "bus", 
  "jeep", "jeep", "jeep", "jeep", "jeep", "car", "bus", "jeep"), 
  Year = c(1995L, 1995L, 1997L, 2000L, 1997L, 1995L, 1997L, 
  2000L, 1997L, 1995L, 2000L, 1995L, 1995L, 1997L, 2000L, 2000L, 
  2000L, 1997L)), .Names = c("Value", "Type", "Year"), 
  class =    "data.frame", row.names = c(NA, -18L))

   df1$Type <- factor(df1$Type, levels=unique(df1$Type))
akrun
  • 874,273
  • 37
  • 540
  • 662
  • @Ranger I updated with a possible plot option. In the future, please ask questions separately. – akrun Jul 25 '15 at 15:55
0

I know it is not the desired output (which honestly I do not like because it is not consistent with the principles of tidy data) but here is another different solution. You might take it into consideration.

 tapply(df$Value, list(df$Type, df$Year), sum)
     1995 1997 2000
bus    36   13   19
car    33   14   18
jeep   38    3   24
SabDeM
  • 7,050
  • 2
  • 25
  • 38
  • I want to plot Value Vs Year plot separately for each type but in a single plot. How can I do it? I want to do it using ggplot2 plotting system – Ranger Jul 25 '15 at 15:48
  • Thank you for the down vote without explaining why. – SabDeM Jul 25 '15 at 17:22