0

I'm wanting to try out DynamoDB and use it for access.logs generated by nginx, which will later be used for a reporting dashboard, that'll include IP, referral url, referral domain, browser, etc.

The initial setup will be EC2 instances running nginx and CloudWatch that will consume the access.logs for the nginx instances.

The idea is that a CloudWatch entry will trigger a lambda function which will parse the log and put it into DynamoDB.

I'm not too familiar with DynamoDB other than what I've read, but here's how I was thinking of doing the schema for this:

ID will be the url hit by nginx, this is what we would be reporting on.

ReferralDomain (table)

  • ID (key)
  • domain (S)
  • Created (range)

ReferralURL (table)

  • ID (key)
  • url (S)
  • Created (range)

ReferralBrowser (table)

  • ID (key)
  • browser (S)
  • Created (range)

And this would continue on for other items being reported on, such as IP or GEO info (ReferralCity, ReferralCountry, etc.).

Does this seem like a good schema design for this type of data within Dynamo? Ultimately, the dashboard will be for a specific ID with date range operations, that will display a list of totals (aggregates) by URL, Browser, etc. as well as actually listing out the data. Also, one of the reports may have unique items listed with counts. For example, for ReferralDomain "Facebook" may have a count of 550 within a date range for a specific ID. This may need to be done within EMR?

Is there a better schema to use or any other considerations that should be taken into account with Dynamo for this type of data? Thank you

dzm
  • 22,844
  • 47
  • 146
  • 226
  • how many URLs are we talking here? (for the hash primary key part) – Chen Harel Nov 01 '15 at 19:20
  • Many millions, would be for images that are served from nginx – dzm Nov 01 '15 at 19:48
  • I should add, I wasn't going to necessarily put the actual URL as the ID, but probably a hash of the url (md5) – dzm Nov 01 '15 at 19:54
  • is created equals to an entry in nginx? each request will be represented in a "row"? – Chen Harel Nov 01 '15 at 20:25
  • Created is a date, that would just represent when the image was served. Basically what this is doing is mapping the nginx access.log to dynamo that can be queried for reporting. Each variable within the access log (user agent, ip, referral domain, etc.), would be a value in a row here, based on this design example. – dzm Nov 01 '15 at 20:35

1 Answers1

0

The primary key looks solid, and your architecture will work and scale nicely.

if i understand nginx / your use case correctly - i'm not sure why you want to split your tables based on an attribute.

You can have one table:

Links (table)

  • ID (primary hash key)
  • Created (primary range key)
  • referralUrl (S attribute)
  • referralDomain (S attribute)
  • referralBrowser (S attribute)
  • ...

And since DynamoDB is schemaless you can leave some of them.

Chen Harel
  • 9,684
  • 5
  • 44
  • 58