1

I am essentially trying to build a website where members can post blog entries and i want to record unique and overall page views for the different posts in absolute terms as well as over different time-frames e.g., last 24h, last week etc.

My initial approach was to use the date as primary key and the blogPostId as secondary key, i could then add all the posts visited during a given day. If i then include the userIds as an attribute i should then be able to a)get unique page views and b)overall page views (which might include duplicate visits by a specific user) for a given day. Finally, i would then pull the primary key for let's say the last 7 days and extract the most popular post.

As far as i can tell this should work fine as long as there aren't too many entries, however, i'm sceptical if this will scale. More specifically, if the number of blog posts increases a lot for a given interval, or if i want to find the all-time most viewed post i'd essentially have to read the whole table.

Has anyone an idea how i could implement this more efficiently?

Fesch
  • 313
  • 1
  • 4
  • 12

2 Answers2

1

DynamoDB will almost certainly work for you, and if you need an excuse to use it, by all means give it a try. If you get a ton or traffic it might end up being expensive.

Personally, I would consider using redis for what you are asking to do, and here is a pretty good/detailed question/answer on how you might implement it:

Scalable way of logging page request data from a PHP application?

E.J. Brennan
  • 45,870
  • 7
  • 88
  • 116
0

DynamoDB can be used to iterate and create this feature quickly.

Nonetheless, this is a feature for Amazon Kinesis Data Streams, which will let you ingest data and then manipulate it to your needs.

Know that Kinesis can become expensive if you try to be as frugal as possible.

But, if you start receiving a lot of traffic, Kinesis will work as a Queue and let you manipulate the data before ingesting it to DynamoDB (Or another Data Store) (It will be cheaper than sending all those write requests).

Another limitation you'd like to check out is that DynamoDB will only return up to 1MB per Query.

Amazon recommends you use Redshift to handle all those operations as it is more suited to perform aggregation and calculation across Data warehouses.

Jose A
  • 10,053
  • 11
  • 75
  • 108