In general, what I want to do is to extract common elements in the sharing column of "word" in several csv files. (2008.csv, 2009.csv, 2010.csv .... 2015.csv)
All files are in the same format:'word','count'
'word' contain all frequent words in one document in a particular year.
here is a snapshot of one of files:
As long as there are two out of 8 files having common elements, I want to know those sharing elements and whichever files are they in. (this is quite like tfidf calculation...btw)
Anyway, my goal is to know some trends of frequent words appearance in those files. (To my knowledge, one element can be in at most five files.)
And I want to know the words when they first appear, which means, a word in file C but not in both file B and A.
I know for + if might solve the problem here, but it is quite tedious, I need to compare 2 out of 8, 3 out of 8, 4 out of 8... columns, in that case, to find sharing elements.
this is the code I worked out so far... far away from what I need... I just compare elements in two out of 8 files: code
Can anyone help?