1

I am using StreamSets as ingestion tool to pull records from Oracle database to Kafka topics. Now, I want to consume it through StreamSets itself and also wanted to count the number of records in Kafka topics.

How can I do that. Kindly help

Manoj Vadehra
  • 836
  • 4
  • 17
Ankita
  • 480
  • 1
  • 6
  • 18

1 Answers1

1

You can use StreamSets Data Collector's history REST API to retrieve data with record counts for each stage. For example, here are the counters for the last run of a given pipeline. I'm using the excellent jq tool to parse the JSON at the command line.

$ curl -s -u admin:admin -H 'X-Requested-By:sdc' http://localhost:18630/rest/v1/pipeline/RedshiftStreamingwithKinesisFirehose537add73-bb16-4358-a26a-a51576dea32b/history | jq -r .[0].metrics | jq .counters
{
  "pipeline.batchCount.counter": {
    "count": 1029
  },
  "pipeline.batchErrorMessages.counter": {
    "count": 0
  },
  "pipeline.batchErrorRecords.counter": {
    "count": 0
  },
  "pipeline.batchInputRecords.counter": {
    "count": 648226
  },
  "pipeline.batchOutputRecords.counter": {
    "count": 648226
  },
  "stage.ExpressionEvaluator_01.errorRecords.counter": {
    "count": 0
  },
  "stage.ExpressionEvaluator_01.inputRecords.counter": {
    "count": 648226
  },
  "stage.ExpressionEvaluator_01.outputRecords.counter": {
    "count": 648226
  },
  "stage.ExpressionEvaluator_01.stageErrors.counter": {
    "count": 0
  },
  "stage.ExpressionEvaluator_01:ExpressionEvaluator_01OutputLane15561338960790.outputRecords.counter": {
    "count": 648226
  },
  "stage.FieldOrder_01.errorRecords.counter": {
    "count": 0
  },
  "stage.FieldOrder_01.inputRecords.counter": {
    "count": 648226
  },
  "stage.FieldOrder_01.outputRecords.counter": {
    "count": 648226
  },
  "stage.FieldOrder_01.stageErrors.counter": {
    "count": 0
  },
  "stage.FieldOrder_01:FieldOrder_01OutputLane15561351879260.outputRecords.counter": {
    "count": 648226
  },
  "stage.FieldTypeConverter_01.errorRecords.counter": {
    "count": 0
  },
  "stage.FieldTypeConverter_01.inputRecords.counter": {
    "count": 648226
  },
  "stage.FieldTypeConverter_01.outputRecords.counter": {
    "count": 648226
  },
  "stage.FieldTypeConverter_01.stageErrors.counter": {
    "count": 0
  },
  "stage.FieldTypeConverter_01:FieldTypeConverter_01OutputLane15560499048280.outputRecords.counter": {
    "count": 648226
  },
  "stage.KinesisFirehose_01.errorRecords.counter": {
    "count": 0
  },
  "stage.KinesisFirehose_01.inputRecords.counter": {
    "count": 648226
  },
  "stage.KinesisFirehose_01.outputRecords.counter": {
    "count": 648226
  },
  "stage.KinesisFirehose_01.stageErrors.counter": {
    "count": 0
  },
  "stage.MySQLBinaryLog_01.errorRecords.counter": {
    "count": 0
  },
  "stage.MySQLBinaryLog_01.inputRecords.counter": {
    "count": 0
  },
  "stage.MySQLBinaryLog_01.outputRecords.counter": {
    "count": 648226
  },
  "stage.MySQLBinaryLog_01.stageErrors.counter": {
    "count": 0
  },
  "stage.MySQLBinaryLog_01:MySQLBinaryLog_01OutputLane15561313696850.outputRecords.counter": {
    "count": 648226
  },
  "stage.StreamSelector_01.errorRecords.counter": {
    "count": 0
  },
  "stage.StreamSelector_01.inputRecords.counter": {
    "count": 648226
  },
  "stage.StreamSelector_01.outputRecords.counter": {
    "count": 648226
  },
  "stage.StreamSelector_01.stageErrors.counter": {
    "count": 0
  },
  "stage.StreamSelector_01:StreamSelector_01OutputLane1556133811620.outputRecords.counter": {
    "count": 0
  },
  "stage.StreamSelector_01:StreamSelector_01OutputLane1556133816638.outputRecords.counter": {
    "count": 648226
  },
  "stage.Trash_01.errorRecords.counter": {
    "count": 0
  },
  "stage.Trash_01.inputRecords.counter": {
    "count": 0
  },
  "stage.Trash_01.outputRecords.counter": {
    "count": 0
  },
  "stage.Trash_01.stageErrors.counter": {
    "count": 0
  }
}
metadaddy
  • 4,234
  • 1
  • 22
  • 46
  • Thanks for the reply. Correct me if I am wrong . So basically the above information is whatever there is the information in streamsets preview. Right? – Ankita May 14 '19 at 07:06
  • It's the information from the history panel - https://streamsets.com/documentation/datacollector/latest/help/datacollector/UserGuide/Pipeline_Monitoring/PipelineMonitoring_title.html#task_p12_gbw_rr – metadaddy May 14 '19 at 14:56
  • Can you help me with some other ways? – Ankita May 16 '19 at 11:30
  • What are you looking for? How does it need to be different from the above? Have you studied the REST calls available - Help / RESTful API – metadaddy May 20 '19 at 22:43
  • No. Haven't studied completely. I was looking for something as we can add as processors in pipeline ? Any way or any idea. – Ankita May 21 '19 at 04:06
  • A processor isn't the way to go here. The REST API gives you all the statistics that are visible in the UI. – metadaddy May 23 '19 at 17:48
  • 1
    Ok. If there are no ways with processors then I go ahead and accept your answer as well. Thanks :) – Ankita May 23 '19 at 17:49