I have a dataframe and im trying to create a vocabulary of terms from it (I have already tokenized and preprocessed to just a list of all words and the Doc ID attached to it), for example I have
Word Doc ID
0 Big XX
1 Big XZ
2 Small XD
3 Big XC
4 Little XY
And I want to group all of the same terms, add a frequency column and keep the Doc ID column with all references for Docs that word appears in, as so:
Word Doc ID Freq
0 Big XX, XY, XC 3
1 Small XD 1
2 Little XY 1
I have tried grouping by the word and using the .count function to return counts but this removes the Doc ID, I also cant concat the freq data as the resultant df wouldnt line up with the values of the first df.
Any help on this please!