1

I noticed similar questions were asked but I am having difficulty troubleshooting this function because it is not working. I am trying to create a countif function in r. I have some data about earthquake magnitudes, and I have create data bins (a sequence from 2 to 8 by .1 increments) and I want to see how many earthquakes are greater than or equal to my bin values.

Here is my data, it is earthquake magnitudes. I call this qdta$mag in my function because it is a variable from a greater data frame. I just made this snippet for you all to test.

qdta = sample(seq(0,8,.05),500, replace = T)

Here are my "data bins," the purpose of my function is to count how many earthquakes are greater than or equal to my bin values (2, 2.1, 2.2, 2.3, 2.4, etc). Then, I created the value column to store the counts.

L = as.data.frame(seq(2,8,.1))
L$value = 0

Here is my function - the function runs, like I do not get an error when creating, but it does not run correctly, meaning the count values are not stored.

#creating the number of loops
loop1 = dim(L)[1]
loop2 = dim(qdta)[1]

#creating my function

#1. I want the function to 
#A. Look at the z cell of qdta$mag (start with first number)
#then check if its bigger than the first cell in first column of x
# if it is, then add a +1 to the value, if not, leave as is. 

#Do this loop however many times I say in loop2 (the size of the qdta), 
#then move to the next i (the next bin value in the L dataframe)

countf = function(x){

  for(i in loop1){
    for(z in loop2){
    
    x[i,2] = ifelse(qdta$mag[z] >= x[i,1],
                    x[i,2] + 1,
                    x[i,2])
    }
  }
}

countf(L)
marc_s
  • 732,580
  • 175
  • 1,330
  • 1,459
  • `loop2` value is NULL when I run your code. Try `length(Data)` instead of `Dim`. – M.Viking Feb 23 '21 at 16:59
  • Thanks, my loop2 value works. Note in the data examples that my data is called qdta. For reference to you guys, I just made a new data frame called Data. – Shehroz Malik Feb 23 '21 at 17:15

2 Answers2

0

See my changes, especially around clarifying which variable is which.

The biggest item is you were treating your loop size (loop1, loop2) as a list, but there were only a single number(eg, 500), so the loop effectively ran once "for i = 500", changed to 1:loop1, this was the main reason you did not get output.

Data = data.frame(mag=sample(seq(0,8,.05),500, replace = T))

L = data.frame(magbin = seq(2,8,.1),
               value = 0)
                                   
loop1 = dim(L)[1]
loop2 = dim(Data)[1]

  for(i in 1:loop1){
    for(z in 1:loop2){
      L$value[i] <- ifelse(Data$mag[z] >= L$magbin[i], (L$value[i] + 1), L$value[i])
    }
  }
   magbin value
1     2.0   370
2     2.1   365
3     2.2   356
4     2.3   347
5     2.4   344
6     2.5   332
...
M.Viking
  • 5,067
  • 4
  • 17
  • 33
0

Thought about this question further, wanting to forgo the nested loops.

Suspect there is an elegant apply or purrr method.

Thanks to this answer - https://stackoverflow.com/a/59835451/10276092 - we use base::outer to apply the function >= to every combination of x and y.

L$value2 <- colSums(outer(Data$mag, L$magbin, FUN = ">="))
   magbin value value2
1     2.0   374    374
2     2.1   365    365
3     2.2   358    358
4     2.3   356    356
5     2.4   351    351
...

Changed the big loop from 500 to 5000 records, and ran microbenchmark

Unit: milliseconds
         expr         min          lq        mean      median          uq        max neval
  loop_method 3209.485572 3283.103802 3716.386542 3380.223922 3757.661975 5758.78067   100
 outer_method    1.994086    2.082549    2.372194    2.179313    2.280101    5.73027   100
M.Viking
  • 5,067
  • 4
  • 17
  • 33