create vector for loop for condition from from dataframe

Question

I have a dataframe, similar to the example below, but larger (15000 rows):

df.example <-structure(list(Date = structure(c(3287, 3386, 4286, 5286, 6286), class = "Date"),v1 = c(1L, 1L, 1L, 1L, 1L), v2 = c(0.60378, 12.82581, 3.55357, 4.96079, 0.0422),perc = c(0.598, 0.598, 0.609, 1, 0.609), v3 = c(-99, -99, 5.83509031198686, 4.96079,0.0692939244663383)), .Names = c("Date", "v1", "v2", "perc", "v3"), row.names = c(1L, 100L, 1000L, 2000L, 3000L), class = "data.frame")

df.example:

       Date     v1       v2  perc           v3
1    1979-01-01  1  0.60378 0.598 -99.00000000
100  1979-04-10  1 12.82581 0.598 -99.00000000
1000 1981-09-26  1  3.55357 0.609   5.83509031
2000 1984-06-22  1  4.96079 1.000   4.96079000
3000 1987-03-19  1  0.04220 0.609   0.06929392

What I would like to do is calculate the percentage of rows that are below a "certain threshold value" for column "perc". I would like to do this multiple times for multiple "certain threshold values", given below:

### "certain threshold values":
seq(from =0, to = 1, by = 0.1)


### formula to be repeated/iterated/looped: (the i stands for "certain value")
100*sum(df.example$perc<=i)/nrow(df.example)

I would like the outcome to be a vector called "vector1", like the example below:

vector1 <- c(0,0,0,0,0,0,0.2,0.6,0.6,0.6,1.0)

This is what I have so far, but it is not working:

### create vector to store calculated values in
vector1=c()
vector1[1]=3

### loop calculation of percentage of rows that are below "certain threshold value" in column df.example$perc
for(i in seq(0,1, by=0.1)){
vector1[i]=sum(df.example$perc<=i)/nrow(df.example)
}

I only get one value, which I would expect to be the last one of my vector1.

I already looked at similar topics in SO, as R create a vector with loop structure & How to make a vector using a for loop

Any suggestions?

By the way: please comment if the dput() I used doesn't create the data to work with, its the first time I use dput().

You may need `s1 <- seq(0, 1, 0.5); for(i in seq_along(s1)){vector1[i]=sum(df.example$perc<=s1[i])/nrow(df.example) }` also, initialize `vector1 <- numeric(nrow(df.example))` — akrun, Nov 07 '16 at 14:58
difference between : for(i in seq_along(seq(0,1, by=0.1))){print(i)} and for(i in seq(0,1, by=0.1)){print(i)} shall explain you the solution — joel.wilson, Nov 07 '16 at 14:59

score 1 · Answer 1 · edited Nov 21 '16 at 14:19

Concerning the number of rows, no need to compute it each time, you can assign it to a variable. Then you can use sapply:

nrow_df <- nrow(df.example)
sapply(seq(from =0, to = 1, by = 0.1), function(x) sum(df.example$perc<=x)/nrow_df)
# [1] 0.0 0.0 0.0 0.0 0.0 0.0 0.4 0.8 0.8 0.8 1.0

Or (vectorized)

indx <- seq(0, 1, by=0.1)
rowSums(df.example$perc <= matrix(indx, length(indx), nrow(df.example))) / nrow(df.example)
## [1] 0.0 0.0 0.0 0.0 0.0 0.0 0.4 0.8 0.8 0.8 1.0

score 0 · Answer 2 · answered Nov 07 '16 at 15:02

0

We need to initialize the vector1 and loop through the sequence in the for loop.

s1 <- seq(0, 1, 0.1)
vector1 <- numeric(nrow(df.example))
for(i in seq_along(s1)){
   vector1[i]=sum(df.example$perc<=s1[i])/nrow(df.example)
 }
vector1
#[1] 0.0 0.0 0.0 0.0 0.0 0.0 0.4 0.8 0.8 0.8 1.0

Or a vectorized approach would be

rowSums(outer(s1, df.example$perc, FUN = `>=`))/nrow(df.example)
#[1] 0.0 0.0 0.0 0.0 0.0 0.0 0.4 0.8 0.8 0.8 1.0

answered Nov 07 '16 at 15:02

akrun

874,273
37
540
662

Your second vectorized approach also worked on the larger dataset. The first approach did not. Thanks for the help! – T. BruceLee Nov 07 '16 at 16:29

score 0 · Accepted Answer · answered Nov 07 '16 at 15:06

Here is a fourth method using outer and colSums:

colSums(outer(df.example$perc, seq(from=0, to=1, by=0.1), "<=")) / nrow(df.example)
[1] 0.0 0.0 0.0 0.0 0.0 0.0 0.4 0.8 0.8 0.8 1.0

outer creates a logical matrix that shows performs the treshold test for each threshold-element pair. The "successes" are summed along the column with colSums, and this count is divided by the number of elements tested.

create vector for loop for condition from from dataframe

3 Answers3