1

I am currently working on a model where I will judge the relevance of an article. This follows Hacker News' algorithm. Here is my articles model in app/articles/models.py

from app.extensions import db

class Article(db.Model):
    """ database representation of an article """
    id = db.Column(db.Integer, primary_key=True)
    title = db.Column(db.String(128))
    subtitle = db.Column(db.String(512))
    body = db.Column(db.Text())
    votes = db.Column(db.Integer, default=1)
    views = db.Column(db.Integer, default=1)
    timestamp = db.Column(db.DateTime, default=datetime.utcnow)

    def popularity(self, gravity=1.8):
        """ uses hacker news popularity rating """
        submit_delta = (self.timestamp - datetime.utcnow()).total_seconds()
        time_decay = submit_delta / 60 / 60
        popularity = (self.views - 1) / (time_decay + 2) ** gravity
        return popularity

Currently, I am trying to sort by the result of popularity.

>>> from app.articles.models import Article
>>> Article.query.order_by(Article.popularity()).all()

this does not work. How would I sort my articles by their popularity?

corvid
  • 10,733
  • 11
  • 61
  • 130

2 Answers2

3

You can use hybrid methods to create a method that produces a SQL expression when called by the class (for querying) but behaves like a regular method when called by an instance.

Here's a working example. It prints the popularity as calculated by python and by the database. These will be slightly different due to timing and rounding.

from datetime import datetime
from sqlalchemy import create_engine, Integer, Column, DateTime, func
from sqlalchemy.ext.declarative import declarative_base
from sqlalchemy.ext.hybrid import hybrid_method
from sqlalchemy.orm import Session

engine = create_engine('postgresql:///example', echo=True)
Base = declarative_base(bind=engine)
session = Session(bind=engine)


class Article(Base):
    __tablename__ = 'article'

    id = Column(Integer, primary_key=True)
    views = Column(Integer, nullable=False, default=1)
    ts = Column(DateTime, nullable=False, default=datetime.utcnow)

    @hybrid_method
    def popularity(self, gravity=1.8):
        seconds = (self.ts - datetime.utcnow()).total_seconds()
        hours = seconds / 3600

        return (self.views - 1) / (hours + 2) ** gravity

    @popularity.expression
    def popularity(self, gravity=1.8):
        seconds = func.extract('epoch', self.ts - func.now())
        hours = seconds / 3600

        return (self.views - 1) / func.power((hours + 2), gravity)


Base.metadata.create_all()

a1 = Article(views=100)
a2 = Article(views=200)

session.add_all((a1, a2))
session.commit()

comparison = session.query(Article, Article.popularity()).all()

for a, pop in comparison:
    print 'py: {} db: {}'.format(a.popularity(), pop)

This works for PostgreSQL, but func.power and func.extract might work differently in other databases. SQLite especially does not have power and extract is implemented differently.

davidism
  • 121,510
  • 29
  • 395
  • 339
1

You're going to need to rewrite the popularity calculation as sql expressions if you want to use it as an "order by" in the database. Your other options are to get all the articles and sort in python (not performant for larger data sets) or to pre-calculate all the popularity values and cache them in a number field in the database, and sort on that.

For example (this is Postgres-specific and I didn't use the Flask idioms but you should get the idea):

order_exp = "(article.views - 1) / power(2 + extract(epoch from (now() at time zone 'UTC' - timestamp))/3600, :gravity)"
order = sqlalchemy.text(order_exp).bindparams(gravity=1.8)
print(session.query(Article).order_by(order).all())
Jason S
  • 13,538
  • 2
  • 37
  • 42
  • yes I've thought of precalculating it. However, since it uses `datetime.utcnow()`, would that even work? Would it be better to create some type of background process or signal to run ever x minutes to refresh the database's popularity ratings? – corvid Aug 05 '14 at 16:20
  • Yeah you would need to periodically refresh it if you did that. – Jason S Aug 05 '14 at 16:24