5

We have a large extended network of users that we track using badges. The total traffic is in the neighborhood of 60 Million impressions a month. We are currently considering switching from a fairly slow, database-based logging solution (custom-built on PHP—messy...) to a simple log-based alternative that relies on Amazon S3 logs and Splunk.

After using Splunk for some other analyisis tasks, I really like it. But it's not clear how to set up a source like S3 with the system. It seems that remote sources require the Universal Forwarder installed, which is not an option there.

Any ideas on this?

Wandering Digital
  • 1,788
  • 2
  • 21
  • 27
  • The only question i have, is how are your logs getting to S3? Are you rolling them there after X minutes/hours. If so, you'd be limited to only a historical non-realtime view. Regardless if we could, would you be interested in testing it out? If so, ping me. –  May 07 '12 at 05:42

4 Answers4

1

Very late answer but I was looking for the same thing and found a Splunk app that does what you want, http://apps.splunk.com/app/1137/. I have yet not tried it though.

cjg
  • 2,727
  • 1
  • 19
  • 23
  • Equally late addition: That app does not scale well. It has a bug that prevents it from reading more than 1,000 objects (it simply doesn't have code to handle truncated listings). It also has a few other flaws and doesn't seem to have a decent way to spread the load amongst indexers. – bstempi Jun 18 '14 at 14:56
0

I would suggest logging j-son preprocessed data to a documentdb database. For example, using azure queues or simmilar service bus messaging technologies that fit your scenario in combination with azure documentdb. So I'll keep your database based approach and modify it to be a schemaless easy to scale document based DB.

opv
  • 1
0

I use http://www.insight4storage.com/ from AWS Marketplace to track my AWS S3 storage usage totals by prefix, bucket or storage class over time; plus it shows me the previous versions storage by prefix and per bucket. It has a setting to save the S3 data as splunk format logs that might work for your use case, in addition to its UI and webservice API.

TJCloudmin
  • 155
  • 3
0

You use Splunk Add-On for AWS.

This is what I understand,

  1. Create a Splunk instance. Use the website version or the on-premise AMI of splunk to create an EC2 where splunk is running.

  2. Install Splunk Add-On for AWS application on the EC2.

  3. Based on the input logs type (e.g. Cloudtrail logs, Config logs, generic logs, etc) configure the Add-On and supply AWS account id or IAM Role, etc parameters.

  4. The Add-On will automatically ping AWS S3 source and fetch the latest logs after specified amount of time (default to 30 seconds).

For generic use case (like ours), you can try and configure Generic S3 input for Splunk

rahuljain1311
  • 1,822
  • 19
  • 20