-1

I have this data.frame:

> d
   x  y
1  6  1
2  2 -1
3  3 -1
4  2 -1
5  3  1
6  1 -1
7  4  1
8  7 -1
9  3 -1
10 4 -1
11 8  1
12 4 -1
13 2 -1
14 9 -1
15 5  1
16 7  1
17 6 -1
18 7 -1
19 3 -1
20 2 -1

I want to search for rows that have the same value in column1 and none of them have +1 in column 2. So, in this case, for example, rows that have value x=2 have no y=1, so I want to remove them. same thing also happens for rows with x=9 and x=1.

In another word, if we create subsets of the data by which in each subset, all the x values are the same, then any subset that doesn't have y=1 should be discarded.

Do you have any suggestion? If it is not clear, I will try to elaborate better!

Vahid Mirjalili
  • 6,211
  • 15
  • 57
  • 80
  • Can you please show us what you have tried. ["Questions asking for code must include attempted solutions, why they didn't work, and the expected results."](http://stackoverflow.com/help/on-topic) – Henrik Oct 16 '13 at 15:46
  • 1
    Give us data we can use and we can show you, you just have to group per x value, this is a job for `data.table` or `plyr` – statquant Oct 16 '13 at 15:50

4 Answers4

1

I think this is what you want:

d[d$x %in% subset(aggregate(y ~ x, d, max), y == 1)$x, ]

For each unique x, get the maximum value:

aggregate(y ~ x, d, max)

Just return those x for which the maximum value of y is one.

subset(aggregate(y ~ x, d, max), y == 1)$x

And now pull out the rows where x is in that group of x values.

d[d$x %in% subset(aggregate(y ~ x, d, max), y == 1)$x, ]
Ciarán Tobin
  • 7,306
  • 1
  • 29
  • 45
1

Here is a simple solution:

>df[with(df,x %in% unique(x[y==1])),]
   x  y
1  6  1
3  3 -1
5  3  1
7  4  1
8  7 -1
9  3 -1
10 4 -1
11 8  1
12 4 -1
15 5  1
16 7  1
17 6 -1
18 7 -1
19 3 -1

Or, equivalently: df[df$x %in% unique(df$x[df$y==1]),]

mrip
  • 14,913
  • 4
  • 40
  • 58
0

This would be the test for the eventuality that d$y contains a value of 1 in any of the rows where d$x==2:

 any( d[d$x==2, "y"] == 1 )

If that eventuality holds, then return a dataframe that has removed all the d$x==2 rows using some Boolean algebra and logical indexing:

 d[ !as.logical( d$x == 2 * any( d[d$x==2, "y"] == 1 ) ) , ]

(Note: the value of 2 did not meet the narrow condition you set.)

If you wanted to apply that rule to all the unique values of d$x and apply a more general exclusionary condition of no y > 0

 lmat <- t( sapply( unique(d$x) , function(val) 
           as.logical( d[["x"]] == val * any( d[d[["x"]]==val, "y"] > 1 ) ) ) )
 # Now use `any` to determine which rows to exclude.
# One does need to transpose that matrix of excluded logicals.
 d[ ! apply( t(lmat) , 1, any), ]
   x  y
2  2 -1
4  2 -1
6  1 -1
13 2 -1
14 9 -1
20 2 -1
IRTFM
  • 258,963
  • 21
  • 364
  • 487
0

Wouldn't it be simpler to first find y==1 then check for duplicated x?

as.data.table(d)[y==1][x %in% x[duplicated(x)]]
Ricardo Saporta
  • 54,400
  • 17
  • 144
  • 178