Count values in a data set that exceed a threshold in R

Question

I have 2 data sets. The first data set has a vector of p-values from 0.5 - 0.001, and the corresponding threshold that meets that p-vale. For example, for 0.05, the value is 13. Any value greater than 13 has a p-value of <0.05. This data set contains all my thresholds that I'm interested in. Like so:

     V1       V2
1 0.500       10
2 0.200       11
3 0.100       12
4 0.050       13
5 0.010       14
6 0.001       15

The 2nd data set is just one long list of values. I need to write an R script that counts the number of values in this set that exceed each threshold. For example, count how many values in the 2nd data set that exceed 13, and therefore have a p-value of <0.05, and do this fore each threshold value.

Here are the first 15 values of the 2nd data set (1000 total):

1    11.100816
2     8.779858
3    10.510090
4     9.503772
5     9.392222
6    10.285920
7     8.317523
8    10.007738
9    11.021283
10    9.964725
11    9.081947
12   11.253643
13   10.896120
14   10.272814
15   10.282408

possible duplicate of [calculating the occurrences of numbers in the subsets of a dataframe in \[R\]](http://stackoverflow.com/questions/5337013/calculating-the-occurrences-of-numbers-in-the-subsets-of-a-dataframe-in-r) — llrs, May 26 '14 at 14:28

score 9 · Answer 1 · answered May 26 '14 at 14:17

9

Function which will help you:

length( which( data$V1 > 3 & data$V2 <0.05 ) )

answered May 26 '14 at 14:17

Pop

12,135
5
55
68

flodel · Accepted Answer · 2014-05-26T14:32:30.500

2

Assuming dat1 and dat2 both have a V2 column, something like this:

colSums(outer(dat2$V2, setNames(dat1$V2, dat1$V2), ">"))

# 10 11 12 13 14 15 
#  9  3  0  0  0  0

(reads as follows: 9 items have a value greater than 10, 3 items have a value greater than 11, etc.)

edited May 26 '14 at 14:32

answered May 26 '14 at 14:27

flodel

87,577
21
185
223

Perfect. I had to tailor it to my column names, but it's great. Thank you! – jstewartmitchel May 26 '14 at 14:36

Count values in a data set that exceed a threshold in R

2 Answers2

Linked