5

How do I apply a function that can "see" the preceding result when operating by rows?

This comes up a lot, but my current problem requires a running total by student that resets if the total doesn't get to 5.

Example Data:

> df

row   Student Absent Consecutive.Absences                             
1        A       0                    0                              
2        A       1                    1                              
3        A       1                    2                              
4        A       0                    0 <- resets to zero if under 5
5        A       0                    0                              
6        A       1                    1                              
7        A       1                    2                              
8        A       1                    3                              
9        B       1                    1 <- starts over for new factor (Student)
10       B       1                    2                              
11       B       0                    0                              
12       B       1                    1                              
13       B       1                    2                              
14       B       1                    3                              
15       B       1                    4                              
16       B       0                    0                              
17       B       1                    1                              
18       B       1                    2                              
19       B       1                    3                              
20       B       1                    4                              
21       B       1                    5                              
22       B       0                    5 <- gets locked at 5
23       B       0                    5                              
24       B       1                    6                              
25       B       1                    7             

I've tried doing this with a huge matrix of shifted vectors.

I've tried doing this with the apply family of functions and half of them do nothing, the other half hit 16GB of RAM and crash my computer.

I've tried straight looping and it takes 4+ hours (it's a big data set)

What bothers me is how easy this is in Excel. Usually R runs circles around Excel both in speed and writability, which leads me to believe I'm missing something elementary here.

Forgetting even the more challenging ("lock at 5") feature of this, I can't even get a cumsum that resets. There is no combination of factors I can think of to group for ave like this:

Consecutive.Absences = ave(Absent, ..., cumsum)

Obviously, grouping on Student will just give the Total Cumulative Absences -- it "remembers" the kid's absence over the gaps because of the split and recombine in ave.

So as I said, the core of what I don't know how to do in R is this:
How do I apply a function that can "see" the preceding result when operating by rows?

In Excel it would be easy:

C3 = IF($A3=$A2,$B3+$C2,$B3)*$B3

This excel function is displayed without the 5-absence lock for easy readability.

Once I figure out how to apply a function that looks at previous results of the same function in R, I'll be able to figure out the rest.

Thank you in advance for your help--this will be very useful in a lot of my applications!

Genuinely, Sam


UPDATE:
Thank you everyone for the ideas on how to identify if a student has 5 consecutive absences!

However, that's easy enough to do in the database at the STUDENTS table. What I need to know is the number of consecutive absences by student in the attendance record itself for things like, "Do we count this particular attendance record when calculating other summary statistics?"

Sam
  • 125
  • 4
  • 7
  • I am a bit confused by "preceding results". Could you please elaborate how your dataframe is built. Is it a given set, on which we perform the counter operation (i.e. values are fixed a priori) or is it built dynamically (maybe inside a function)? In the first case `ddply` with a custom function for counter could be used I think. – Maxim.K Apr 15 '13 at 20:33
  • In addition, if your goal is to find students which have 5 consecutive '1's (e.g. for attendance), you could simply search for the vector rep(1,5) for each student. Would that work for you? – Maxim.K Apr 15 '13 at 20:37
  • By "preceding results" I mean that I want a function that can operate on the result it just got in the last row. The simplest example would be the excel snippet I gave. If this row's absence is 0, return 0. If this row's absence is 1, give me the last consecutive absence calculation plus 1. So the function may use it's "preceding result" in the calculation. – Sam Apr 15 '13 at 20:49
  • I need to know what day the student reached 5 consecutive absences. That may be possible searching for the vector rep(1,5)... Have to play with that. Still doesn't change the fact that I really need to learn how to apply a function that can see its own previous results (or even "previous results" from another column). Again--so easy to do in Excel. I have to be missing something elementary here... – Sam Apr 15 '13 at 20:53

3 Answers3

8

If you're looking to apply a function to every element in a vector while making use the previous element's value, you might want to check out "Reduce", with the accumulate parameter set to True

Here's an example:

##define your function that takes two parameters
##these are the 'previous' and the 'current' elements
runSum <- function(sum, x){
    res = 0
    if (x == 1){
        res = sum + 1
    }
    else if (x == 0 & sum < 5){
        res = 0
    }
    else{
        res = sum
    }
    res
}

#lets look at the absent values from subject B
x = c(1, 1, 0, 1, 1, 1, 1, 0, 1, 1, 1, 1, 1, 0, 0, 1, 1)

Reduce(x=x, f=runSum, accumulate=T) 
# [1] 1 2 0 1 2 3 4 0 1 2 3 4 5 5 5 6 7
kith
  • 5,486
  • 1
  • 21
  • 21
  • Thank you! I can tell I will have to research and play with the Reduce function a little more before it becomes intuitive though. – Sam Apr 15 '13 at 22:06
1

It's fairly easy to identify the students with one or more runs of 5:

 tapply(dfrm$Absent, dfrm$Student, function(x) rle(x)$value[rle(x)$length >=5] )
$A
integer(0)

$B
[1] 1

Look for any values of "1" in the result:

tapply(dfrm$Absent, dfrm$Student, function(x) 1 %in% rle(x)$value[rle(x)$length >=5] )
    A     B 
FALSE  TRUE 

I also struggled through to a Reduce solution (but am second in priority to @kithpradhan):

ave(dfrm$Absent, dfrm$Student, 
           FUN= function(XX) 
              Reduce(function(x,y) if( x[1] >= 5){  y+x[1]
                                           } else{ x[1]*y+y } ,  #Resets to 0 if y=0
                 XX, accumulate=TRUE) 
    )
#[1] 0 1 2 0 0 1 2 3 1 2 0 1 2 3 4 0 1 2 3 4 5 5 5 6 7
IRTFM
  • 258,963
  • 21
  • 364
  • 487
  • You really should reassign the checkmark to @kithpradham. He got there first and his code is more readable. (Plus, I don't need the points.) – IRTFM Apr 15 '13 at 21:33
  • Ok I changed the check per your request, however your solution solved the whole problem in one step. His was more "instructive" though I can tell it's going to take me a couple hours of playing with the Reduce function before it becomes intuitive. – Sam Apr 15 '13 at 22:02
  • I will tell you that I find the x, and y roles in the Reduce function's "f" arguemtn to be be a perpetual source of confusion for me. I almost always get them switched the first time through. The second or "y" argument gets assigned to "incoming" value from the outside "x" argument, while the internal "x" argument is the one that is growing (or "accumulating") and gets passed to the next iteration. (At least that's what I imagining is happening. I still consider myself a functional programming novice.) – IRTFM Apr 15 '13 at 22:09
0

For the record, you can also create your own Reduce-derivative which receives f and x, and applies f(x) on its output until x == f(x) or maxiter is reached:

ireduce = function(f, x, maxiter = 50){
  i = 1
  while(!identical(f(x), x) & i <= maxiter) {x = f(x); i = i+1}; x
}
JDG
  • 1,342
  • 8
  • 18