0

I'm trying to send a non-cloudwatch event to Splunk from Kinesis Firehose. I am processing the event with a Lambda and feeding it back into the Firehose in the following format (required for Firehose):

{ 
    "records": [
        {
          "recordId": "2345678",
          "result": "Ok",
          "data": [base64-encoded custom JSON]
        }
    ]
}

However, it's throwing a vague parsing error once it gets to Splunk, with a help link that goes nowhere:

"errorCode":"Splunk.InvalidDataFormat","errorMessage":"The data is not formatted correctly. To see how to properly format data for Raw or Event HEC endpoints, see Splunk Event Data (http://dev.splunk.com/view/event-collector/SP-CAAAE6P#data)"

What am I missing here? It seems strange that the HEC endpoint wouldn't be able to parse the messages coming from Firehose in their standard format.

I am sending the message to an HEC Event endpoint, using the splunk_configuration block in an aws_kinesis_firehose_delivery_stream Terraform module.

1 Answers1

1

Figured it out! For posterity, since this isn't well-documented:

Your data field in the Kinesis Firehose payload needs to be a base64-encoded object that follows the Splunk event collector spec.

As long as both Firehose and Splunk can read the payload that the Lambda returns, it shouldn't throw an error.

Here's the code for the Kinesis Firehose transformer Lambda (node12 runtime):

/*
* Transformer for sending Kinesis Firehose events to Splunk
*
* Properly formats incoming messages for Splunk ingestion
* Returned object gets fed back into Kinesis Firehose and sent to Splunk
*/

'use strict';
console.log('Loading function');

exports.handler = (event, context, callback) => {
    let success = 0; // Number of valid entries found
    let failure = 0; // Number of invalid entries found
    let dropped = 0; // Number of dropped entries

    /* Process the list of records and transform them to adhere to Splunk specs */
    const output = event.records.map((record) => {
        try {
            const entry = (Buffer.from(record.data, 'base64')).toString('utf8');

            /*
             * IMPORTANT: `data` object should follow Splunk event formatting specs prior to encoding.
             * Otherwise, it will throw a parsing error.
             * https://docs.splunk.com/Documentation/Splunk/8.0.3/Data/FormateventsforHTTPEventCollector
             */
            const obj = {
                sourcetype: "aws:firehose:json", // Required, will error
                event: JSON.parse(entry)
            }
            const payload = (Buffer.from(JSON.stringify(obj), 'utf8')).toString('base64');
            success++;
            return {
                recordId: record.recordId,
                result: 'Ok',
                data: payload,
            };
        } catch (e) {
            failure++
            console.error(e.message());
            return {
                recordId: record.recordId,
                result: 'ProcessingFailed'
            };
        }
    });
    console.log(`Processing completed.  Successful records ${success}. Failed records ${failure}.`);
    callback(null, {records: output});
}