I'm an R user and I cannot figure out the pandas equivalent of match(). I need use this function to iterate over a bunch of files, grab a key piece of info, and merge it back into the current data structure on 'url'. In R I'd do something like this:
logActions <- read.csv("data/logactions.csv")
logActions$class <- NA
files = dir("data/textContentClassified/")
for( i in 1:length(files)){
tmp <- read.csv(files[i])
logActions$class[match(logActions$url, tmp$url)] <-
tmp$class[match(tmp$url, logActions$url)]
}
I don't think I can use merge() or join(), as each will overwrite logActions$class each time. I can't use update() or combine_first() either, as neither have the necessary indexing capabilities. I also tried making a match() function based on this SO post, but cannot figure out how to get it to work with DataFrame objects. Apologies if I'm missing something obvious.
Here's some python code that summarizes my ineffectual attempts to do something like match() in pandas:
from pandas import *
left = DataFrame({'url': ['foo.com', 'foo.com', 'bar.com'], 'action': [0, 1, 0]})
left["class"] = NaN
right1 = DataFrame({'url': ['foo.com'], 'class': [0]})
right2 = DataFrame({'url': ['bar.com'], 'class': [ 1]})
# Doesn't work:
left.join(right1, on='url')
merge(left, right, on='url')
# Also doesn't work the way I need it to:
left = left.combine_first(right1)
left = left.combine_first(right2)
left
# Also does something funky and doesn't really work the way match() does:
left = left.set_index('url', drop=False)
right1 = right1.set_index('url', drop=False)
right2 = right2.set_index('url', drop=False)
left = left.combine_first(right1)
left = left.combine_first(right2)
left
The desired output is:
url action class
0 foo.com 0 0
1 foo.com 1 0
2 bar.com 0 1
BUT, I need to be able to call this over and over again so I can iterate over each file.