-1

I have the following data:

col1 = c(rep("a",4),rep("b",8),rep("c",6), rep("d",2))
col2 = sample(-100:250, 20)
col3 = cumsum(col2)
data = data.table(col1, col2, col3)


and data.table:

    col1 col2 col3
 1:    a   56   56
 2:    a   90  146
 3:    a   85  231
 4:    a  214  445
 5:    b  -39  406
 6:    b  116  522
 7:    b   42  564
 8:    b  131  695
 9:    b  161  856
10:    b   54  910
11:    b   15  925
12:    b  229 1154
13:    c  166 1320
14:    c  224 1544
15:    c  -53 1491
16:    c   87 1578
17:    c -100 1478
18:    c  -11 1467
19:    d   28 1495
20:    d  143 1638

As you see it's just grouped by col1. I'd like to make some calculation (like cumsum, count if, etc) based on groups in col1.

In the end I'd would like to have:

col1 colsum countif>0  countif<0
a    445    4          0
b    709    7          1
c    313    3          3
d    171    2          0


@commentators Guys! Please ... I did two solutions, the first very unsightly (no sense to put it here, but is based on making a list and loop with calculation for each element of list) and second this is:

a1 = aggregate (col2 ~ col1, sum, date = date)
a2 = aggregate (col2> 0 ~ col1, sum, date = date)
a3 = aggregate (col2 <0 ~ col1, sum, date = date)
cbind (a1, a2 counfif_1 = [2], counfif_2 = a3 [2])

I'm looking just for something nice and cool.

ekad
  • 14,436
  • 26
  • 44
  • 46
MarekW
  • 41
  • 5
  • 1
    Okay, and after reading the package vignettes where exactly are you stuck? – Roland Jun 11 '14 at 10:44
  • 1
    Your question amounts to saying 'do this for me'. That is not what this site is for. – asb Jun 11 '14 at 10:52
  • Please see "[What types of questions should I avoid asking?](http://stackoverflow.com/help/dont-ask)". – hrbrmstr Jun 11 '14 at 11:04
  • Guys! Please ... I did two solutions, the first very unsightly and second this: a1 = aggregate (col2 ~ col1, sum, date = date) a2 = aggregate (col2> 0 ~ col1, sum, date = date) a3 = aggregate (col2 <0 ~ col1, sum, date = date) cbind (a1, a2 counfif_1 = [2], counfif_2 = a3 [2]) I'm looking just for something nice and cool. – MarekW Jun 11 '14 at 11:13

3 Answers3

1

You can use dplyr to achieve something similar

library(dplyr)

set.seed(1)
col1 <- c(rep("a", 4), rep("b", 8), rep("c", 6), rep("d",2))
col2 <- sample(-100:250, 20)
data <- tbl_df(data.frame(col1, col2))
str(data)
## Classes ‘tbl_df’, ‘tbl’ and 'data.frame':    20 obs. of  3 variables:
##  $ col1: Factor w/ 4 levels "a","b","c","d": 1 1 1 1 2 2 2 2 2 2 ...
##  $ col2: int  -7 30 99 216 -31 210 225 127 115 -79 ...


data %>%
    group_by(col1) %>%
        summarise(colsum = sum(col2),
                  countifpos = sum(col2 > 0),
                  countifneg = sum(col2 < 0))
## Source: local data frame [4 x 4]

##   col1 colsum countifpos countifneg
## 1    a    338          3          1
## 2    b    497          4          4
## 3    c    758          6          0
## 4    d    184          2          0
dickoa
  • 18,217
  • 3
  • 36
  • 50
1
data[, list(colsum = sum(col2),
            `countif>0` = sum(col2 > 0),
            `countif<0` = sum(col2 < 0)), by = col1]

##     col1 colsum countif>0 countif<0
## 1:    a    445         4          0
## 2:    b    709         7          1
## 3:    c    313         3          3
## 4:    d    171         2          0
David Arenburg
  • 91,361
  • 17
  • 137
  • 196
0

You can use tapply to get summaries by group for instance:

this is where you define the metrics you are calculating

metrics =  function(x) { c(sum(x), length(x[x<0]) , length(x[x>0]) )}

the you use the metrics function to calculate your metrics by group via a tapply function

tapply (data$col2, data$col1, metrics)
$a
[1] 241 -50 291

$b
[1] 526 -86 612

$c
[1] 483 -94 577

$d
[1] -88 -88   0

You can then convert this output into a data frame as requested

shecode
  • 1,716
  • 6
  • 32
  • 50