Suppose I had a list of slogans (short, multi-word phrases), and people had voted for the ones they liked best, and I wanted to assess which words, if any, made some slogans more popular than others. What would be the best way to achieve this? My first thought was just to find all the unique words in the set of slogans and score each one as the average number of votes of all the slogans that contain said word, but frequency should also come into play in some fashion, I think, so that the following should be true:
- If Word A occurs in only the slogan that got the most votes and Word B only occurs in the slogan that got the second-most, Word A is more "popularity-generating"
- However, if Word A occurs only in the top-ranked slogan and Word B occurs in both the second- and third-ranked slogans, Word B should win, since it pushed more slogans to the top.
- However, a single occurrence of Word A in the top slogan should still trump three appearances of Word B in other slogans if they're, say, in the middle, or bottom half, of the pack (that is to say, there needs to be a balance of vote-getting and frequency in scoring).
I also want to eliminate words that are generally common (e.g., "the" or "of"). This is sort of related to questions about identifying trending words that have been asked in the past, but different because change over time isn't a factor. I'd be happy just to be pointed in the right direction about this as far as literature is concerned, but I'm not really sure what to look for. Is this a class of problem that other people deal with?