How do I calculate the maximum or minimum seen so far in a sequence, and its associated id?

Question

From this Stata FAQ, I know the answer to the first part of my question. But here I'd like to go a step further. Suppose I have the following data (already sorted by a variable not shown):

To calculate the minimum in this sequence, I do

generate minsofar = v1 if _n==1
replace  minsofar = min(v1[_n-1], minsofar[_n-1]) if missing(minsofar)

To get

id v1 minsofar 
A  9  9        
B  8  9        
C  7  8        
B  7  7        
A  5  7        
C  4  5        
A  3  4        
A  2  3

Now I'd like to generate a variable, call it id_min that gives me the ID associated with minsofar, so something like

id v1 minsofar id_min
A  9  9         A
B  8  9         A
C  7  8         B
B  7  7         C
A  5  7         C
C  4  5         A
A  3  4         C 
A  2  3         A

Note that C is associated with 7, because 7 is first associated with C in the current sorting. And just to be clear, my ID variable here shows as a string variable just for the sake of readability -- it's actually numeric.

Ideas?

EDIT:

I suppose

gen id_min = id if _n<=2
replace id_min = id[_n-1] if v1[_n-1]<minsofar[_n-1] & missing(id_min)
replace id_min = id_min[_n-1] if missing(id_min)

does the job at least for the data in this example. Don't know if it would work for more complex cases.

Only you know what works for your real data since we only have access to the example you post. — Roberto Ferrer, Oct 16 '14 at 04:20

Roberto Ferrer · Answer 1 · 2014-10-16T04:07:36.570

This works for your example. It uses the user-written command vlookup, which you can install running findit vlookup and following through the link that appears.

clear
set more off

input ///
str1 id v1 
A  9  
B  8  
C  7  
B  7  
A  5  
C  4  
A  3  
A  2  
end

encode id, gen(id2)
order id2
drop id

list

*----- what you want -----

// your code
generate minsofar = v1 if _n==1
replace  minsofar = min(v1[_n-1], minsofar[_n-1]) if missing(minsofar)

// save original sort
gen osort = _n

// group values of v1 but respecting original sort so values of 
// id2 don't jump around
sort v1 osort

// set obs after first as missing so id2 is unique within v1
gen v2 = v1
by v1: replace v2 = . if _n > 1

// lookup
vlookup minsofar, gen(idmin) key(v2) value(id2)

// list
sort osort
drop osort v2
list, sep(0)

Your code has generate minsofar = v1 if _n==1 which is better coded as generate minsofar = v1 in 1, because it is more efficient.

Your minsofar variable is just a displaced copy of v1, so if this is always the case, there should be simpler ways of handling your problem. I suspect your problem is easier than you have acknowledged until now, and that has come through your post. Perhaps giving more context, expanded example data, etc. could get you better advice.

score 0 · Accepted Answer · answered Oct 16 '14 at 09:21

This is both easier and a little more challenging than implied so far. Given value (a little more evocative than the OP's v1) and a desire to keep track of minimum so far, that's for example

 generate min_so_far = value[1] 
 replace min_so_far = value if value < min_so_far[_n-1] in 2/L

where the second statement exploits the unsurprising fact that Stata replaces in the current order of observations. [_n-1] is the index of the previous observation and in 2/L implies a loop over all observations from the second to the last.

Note that the OP's version is buggy: by always looking at the previous observation, the code never looks at the very last value and will overlook that if it is a new minimum. It may be that the OP really wants "minimum before now" but that is not what I understand by "minimum so far".

If we have missing values in value they will not enter the comparison in any malign way: missing is always regarded as arbitrarily large by Stata, so missings will be recorded if and only if no non-missings are present so far, which is as it should be.

The identifier of that minimum at first sight yields to the same logic

 generate min_so_far = value[1] 
 gen id_min = id[1] 
 replace min_so_far = value if value < min_so_far[_n-1] in 2/L 
 replace id_min = id if value < min_so_far[_n-1] in 2/L

There are at least two twists that might bite. The OP mentions a possibility that the identifier might be missing so that we might have a new minimum but not know its identifier. The code just given will use a missing identifier, but if the desire is to keep separate track of the identifier of the minimum value with known identifiers, different code is needed.

A twist not mentioned to date is that observations with different identifier might all have the same minimum so far. The code above replaces the identifier only the first time a particular minimum is seen; if the desire is to record the identifier of the last occurrence the < in the last code line above should be replaced with <=. If the desire is to keep track of the all the identifiers of the minimum so far, then a string variable is needed to concatenate all the identifiers.

With a structure of panel or longitudinal data the whole thing is done under the aegis of by:.

I can't see a need to resort to user-written extensions here.

Thanks Nick; "minimum before now" is really what I meant, so the solution isn't buggy. I'll edit it so this is clear for future readers. — djas, Oct 18 '14 at 02:25
one thing though: stata complains if you use `by` and `in` -- as you rightly guessed, I do have longitudinal data; reminds me of why I was using the more complicated construction provided in the Stata FAQs. — djas, Oct 18 '14 at 02:32
Still a bug, as "minimum before now" should be declared missing in the first observation, as the previous is not known by definition. Under `by:` `in 1` must be `if _n == 1` and `in 2/L` could be `if _n > 1`. — Nick Cox, Oct 18 '14 at 08:11

How do I calculate the maximum or minimum seen so far in a sequence, and its associated id?

2 Answers2