0

I have 2 data sets. The first data set has a vector of p-values from 0.5 - 0.001, and the corresponding threshold that meets that p-vale. For example, for 0.05, the value is 13. Any value greater than 13 has a p-value of <0.05. This data set contains all my thresholds that I'm interested in. Like so:

     V1       V2
1 0.500       10
2 0.200       11
3 0.100       12
4 0.050       13
5 0.010       14
6 0.001       15

The 2nd data set is just one long list of values. I need to write an R script that counts the number of values in this set that exceed each threshold. For example, count how many values in the 2nd data set that exceed 13, and therefore have a p-value of <0.05, and do this fore each threshold value.

Here are the first 15 values of the 2nd data set (1000 total):

1    11.100816
2     8.779858
3    10.510090
4     9.503772
5     9.392222
6    10.285920
7     8.317523
8    10.007738
9    11.021283
10    9.964725
11    9.081947
12   11.253643
13   10.896120
14   10.272814
15   10.282408
jstewartmitchel
  • 171
  • 3
  • 3
  • 11

2 Answers2

9

Function which will help you:

length( which( data$V1 > 3 & data$V2 <0.05 ) )

Pop
  • 12,135
  • 5
  • 55
  • 68
2

Assuming dat1 and dat2 both have a V2 column, something like this:

colSums(outer(dat2$V2, setNames(dat1$V2, dat1$V2), ">"))

# 10 11 12 13 14 15 
#  9  3  0  0  0  0 

(reads as follows: 9 items have a value greater than 10, 3 items have a value greater than 11, etc.)

flodel
  • 87,577
  • 21
  • 185
  • 223