After searching for a while I haven't found an elegant solution to this (usually pedantic answers like "just vectorize it" which may not apply all the time), so I thought I'd ask.
The simple problem is this: I need to loop over 2 control variables. (this is what's usually asked, and answered curtly)
The real (specific) problem I have, which may not apply to everyone (looking for an answer to this type of question) is this: I have a data frame. Lets say it's payroll data.
ID,FIRST_NAME,LAST_NAME,PAYDATE,AMT
912367,Jim,Smith,1/1/2000,5000
1102467,LAURA,JAMES,1/1/2000,5000
812367,DAVID,johnson,1/1/2000,5000
555555,ian,Smith,1/1/2000,5000
912367,Jim,SMITH,1/8/2000,4000
...
And yes, the names are dirty like that. Say Unnamed Boss comes around and says, do some stuff with this and other data... and gives you a list of names. Of course, they're properly formatted:
Smith,Jim R
Fields,Samantha
Smith,Kelly
Lensdotter,Patricia
I chose to break them (easy in a csv) to read them in as something similar to
fnames <- c(Jim,Samantha,Kelly,Patricia)
and associated last names (i.e. 2 variables). Then I read in the dataframe, did some nested loops and greps (to ignore case). Searched on easier ways and found how to "python zip" the lists, etc. but I was wondering if there was an easier way?
my code is very similar to:
EID <- vector(mode="integer")
for (i in 1:length(lnames)){
l <- lnames[i]
f <- fnames[i]
if(grepl(l,payroll[3],ignore.case = T)){
paycut1 <- payroll[grepl(l,payroll$LAST_NAME,ignore.case = T),]
if(grepl(f,paycut[2],ignore.case=T)){
paycut2 <- paycut[grepl(f,paycut$FIRST_NAME,ignore.case=T),]
}
print(paste0(l,", ",f," Has EID: ", paycut2[1,1]))
EIDs <- c(EIDs,paycut2[1,1])
}else{
print(paste0(l,", ",f," NOT in Payroll Data: "))
}
}
so I can grab the ID's out of the file associated with the names (so I don't have to deal with names!). Any suggestions? (I don't want to have to use the for (i in range):
construct (kind of inelegant) as opposed to a more c/python like for i,j:
construct.
(Sorry for the explanation at the beginning, but I think that searching for a question like this deserves an answer, and not everyone can frame a question right, so answers like "just vectorize it" which may not apply in their situation dissuades them from continuing to ask)
P.S. If I'm going about it the completely wrong way, I'm not averse to other points of view. I come from a C background, so I'm used to loops and non-vectorized code. I just couldn't see how to vectorize this. Criticism, though only helpful criticism, is welcomed.