1

We're looking at moving away from Splunk as our datastore and looking at AWS Data Lake backed by S3.

What would be the process of migrating data from Splunk to S3? I've read lots of documents talking about archiving data from Splunk to S3 but not sure if this archives the data as a usable format OR if its in some archive format that needs to be restored to splunk itself?

Garreth
  • 1,057
  • 2
  • 9
  • 24
  • I was hoping there was just some export tool, where for example data can be exported to local EBS and then an S3 Copy script run to upload to S3? – Garreth Nov 12 '19 at 15:50

2 Answers2

2

Check out Splunk's SmartStore feature. It moves your non-hot buckets to S3 so you save storage costs. Running SmartStore on AWS only makes sense, however, if you run Splunk on AWS. Otherwise, the data export charges will bankrupt you. Data export applies when Splunk needs to search a bucket that's stored in S3 and so copies that bucket to an indexer. See https://docs.splunk.com/Documentation/Splunk/8.0.0/Indexer/AboutSmartStore for more information.

RichG
  • 9,063
  • 2
  • 18
  • 29
  • Thanks - so this would be a viable option to migrate away from Splunk by moving all data to S3 using SmartStore? – Garreth Nov 09 '19 at 12:07
  • SmartStore is not a migration path away from Splunk. Data stored using SmartStore (S2) is still in Splunk's proprietary format. S2 is a feature that separates storage and compute to help Splunk customers save on their storage costs. Data archived to S3 is still in Splunk format. LogRhythm has a program to help Splunk users migrate to their platform, but I am not aware of something similar for AWS Data Lake. – RichG Nov 09 '19 at 13:12
0

From what I've read there are a couple of ways to do it:

  • Export using the Web UI
  • Export using REST API Endpoint
  • Export using CLI
  • Copy certain files in the filesystem

So far I've tried using the CLI to export and I've managed to export around 500,000 events at a time using

splunk search "index=main earliest=11/11/2019:00:00:01 latest=11/15/2019:23:59:59" -output rawdata -maxout 500000 > output2.dmp

However - I'm not sure how I can accurately repeat this step to make sure I include all 100 million+ events. IE search from DATE A to DATE B for 500,000 records, then search from DATE B to DATE C for the next 500,000 - without missing any events inbetween.

Garreth
  • 1,057
  • 2
  • 9
  • 24