12

I have around 4000 blog posts with me. I wanna rank all posts according to the following values

Upvote Count => P
Comments Recieved => C
Share Count => S
Created time in Epoch => E
Follower Count of Category which post belongs to => F (one post has one category)
User Weight => U (User with most number of post have biggest weight)

I am expecting answer in pseudo code.

Timothy Shields
  • 75,459
  • 18
  • 120
  • 173
shajin
  • 3,214
  • 5
  • 38
  • 53

2 Answers2

25

Your problem falls into the category of regression (link). In machine learning terms, you have a collection of features (link) (which you list in your question) and you have a score value that you want to predict given those features.

What Ted Hopp has suggested is basically a linear predictor function (link). That might be too simple a model for your scenario.

Consider using logistic regression (link) for your problem. Here's how you would go about using it.

1. create your model-learning dataset

Randomly select some m blog posts from your set of 4000. It should be a small enough set that you can comfortably look through these m blog posts by hand.

For each of the m blog posts, score how "good" it is with a number from 0 to 1. If it helps, you can think of this as using 0, 1, 2, 3, 4 "stars" for the values 0, 0.25, 0.5, 0.75, 1.

You now have m blog posts that each have a set of features and a score.

You can optionally expand your feature set to include derived features - for example, you could include the logarithm of the "Upvote Count," the "Comments Recieved", the "Share Count," and the "Follower Count," and you could include the logarithm of the number of hours between "now" and "Created Time."

2. learn your model

Use gradient descent to find a logistic regression model that fits your model-learning dataset. You should partition your dataset into training, validation, and test sets so that you can carry out those respective steps in the model-learning process.

I won't elaborate any more on this section because the internet is full of the details and it's a canned process.

Wikipedia links:

3. apply your model

Having learned your logistic regression model, you can now apply it to predict the score for how "good" a new blog post is! Simply compute the set of features (and derived features), then use your model to map those features to a score.

Again, the internet is full of the details for this section, which is a canned process.


If you have any questions, make sure to ask!

If you're interested in learning more about machine learning, you should consider taking the free online Stanford Machine Learning course on Coursera.org. (I'm not affiliated with Stanford or Coursera.)

Timothy Shields
  • 75,459
  • 18
  • 120
  • 173
  • I actually feel this model is an overkill. As Ted suggested you want to figure out how important each factor/feature is and compute a score. This is exactly what this answer is asking to do in step 1. How would you assign a rank to blog posts in training set rationally and consistently without assigning some weights to features of relevance. Now if you have explicitly assigned weights then the problem is already solved, why use a regression algorithm to "predict" the rank. – Gmu Jan 28 '14 at 23:40
  • 1
    @Gmu After eating at a restaurant, watching a movie, reading a book, etc., could you rate the overall experience on a scale from 0 to 5 stars? When you rate the restaurant, are you consciously basing your rating on very low level features like "Fat Content," "Carbohydrate Content," "Server Friendliness," etc. and then combining these with weights that you consciously decided on? Probably not. Yet you can rate restaurants "rationally and consistently" (to some degree). – Timothy Shields Jan 29 '14 at 00:39
  • thanks for clarifying. so it is bringing out the latent weights that your mind subconsciously is assigning. Quantifying the Qualitative assuming consistency in the qualitative responses/scores. – Gmu Feb 18 '14 at 19:48
13

I'd suggest a weighted average of the individual scores for each blog post. Assign a weight that reflects both the relative importance of each value and the differences in value scale (e.g., E is going to be a very large number compared to the other values). Then compute:

rank = wP * P + wC * C + wS * S + wE * E + wF * F + wU * U;

You don't provide any information about the relative importance of each value or even what the values mean in terms of rank. So it's impossible to be more specific about this. (Does an older creation time push a post up or down in rank? If down, then wE should be negative.)

Ted Hopp
  • 232,168
  • 48
  • 399
  • 521