Questions tagged [streamsets]

Use the streamsets tag for questions regarding StreamSets DataOps Platform which includes Data Collector, Transformer and Control Hub.

StreamSets DataOps Platform empowers your whole team, from highly skilled data engineers to visual ETL developers, to do powerful data engineering work. Only StreamSets makes it both simple to get started building pipelines quickly with intent-driven design and easy to extend to meet complex enterprise needs.

Useful Resources:

Initial Release: June 27th, 2014 - StreamSets Data Collector – the First Four Years

Latest Production Release Series:

EBooks:

183 questions
1
vote
1 answer

Streamsets : Is there any way to count records in Kafka topic using streamsets

I am using StreamSets as ingestion tool to pull records from Oracle database to Kafka topics. Now, I want to consume it through StreamSets itself and also wanted to count the number of records in Kafka topics. How can I do that. Kindly help
Ankita
  • 480
  • 1
  • 6
  • 18
1
vote
1 answer

StreamSets: How to unzip a folder using streamsets

I have a .zip file and I want to extract it using Streamsets and put data(.zip) to Kafka. How can I do that?
Ankita
  • 480
  • 1
  • 6
  • 18
1
vote
1 answer

StreamSets can I read runtime value inside some Scripts process like JavaScript process?

I try to use JavaScript processor steps in StreamSets. I defined some environment values and can invoke from expression. ${type}='month'; In JavaScript, how to use those environment values? Can you write a js example for get the value of ${type} in…
user504909
  • 9,119
  • 12
  • 60
  • 109
1
vote
1 answer

Retrieving data from Streamsets Data Collector (SDC) protected by Kerberos

I am trying to retrieve data from the SDC API protected by Kerberos. Initially i am posting the credentials to the SCH login page and then using the cookies generated to access the SDC rest api. However, i am not able to post the credentials.…
SaAk
  • 75
  • 2
  • 13
1
vote
1 answer

How to send request body of post request in HttpClient destination?

I am using stremasets ETL tool to transform data. After transformation sending data to a rest service with HTTPClient Destination as POST request. but there I did't find any place to send request body. So how can I trigger rest with post…
kumar
  • 31
  • 4
1
vote
1 answer

Apache NiFi and StreamSets

Is Apache NiFi slower than StreamSets? I have created a pipeline which receives data from a Kafka topic and dumps the data in another Kafka topic in both Apache NiFi and StreamSets but StreamSets is way faster than NiFi. I am using…
1
vote
1 answer

Issue with installing External Libraries in Streamsets Data Collector

I have a ridiculous issue with installing External Libraries, I've done all steps from Streamsets's document but after restarting Streamsets I got this error : Expected exactly 1 stage lib jar but found 2 with name streamsets-datacollector-jdbc-lib.…
1
vote
1 answer

Special characters (accent, apostrophe, trema) work in custom Source tests, but no longer when deployed in dockerized Streamsets

I've written a custom Streamsets origin. Some of the records contain characters like é or ë. When running my automated tests I can validate that the data is emitted as a list of SDC Records as intended. When I use my custom origin in a pipeline on a…
nielsn
  • 87
  • 1
  • 8
1
vote
1 answer

Error when trying to get Azure Kubernetes Service to use Cluster Load balancer from Service

I'm working to get Streamsets Data Collector running in Azure Kubernetes Service (AKS) and when I run kubectl .... the service appears to be up, however its giving this error. This is an RBAC AKS Cluster so I think I need to give the service…
Rob
  • 1,163
  • 2
  • 18
  • 28
1
vote
0 answers

convert timestamp to UTC in stream sets

I am ingesting logs from different zones in Hadoop through streamsets. I want to convert different timestamps to a single UTC timestamp. How can I do that in streamsets?
spandanad
  • 31
  • 2
1
vote
0 answers

read data through stream sets from a continuously updating file

I want to read data from an active file through stream sets. when I tried running pipeline using file tail origin after updating the existing file, it is displaying the below error: message:Pipeline Status: RUNNING_ERROR: …
spandanad
  • 31
  • 2
1
vote
1 answer

Getting error while sending email by using Streamsets

i am trying to send an email by using StreamSets. for this, i am using Directory as Source(list of receipts in the text file) and Jython Evaluator for Processing and trash for Destination(for testing only). when i run pipeline, running without any…
ROOT
  • 1,757
  • 4
  • 34
  • 60
1
vote
1 answer

Streamsets Pipeline to ingest files to HDFS throwing misleading "File not Found" Exception

We have a Streamsets job set up. Which although it runs successfully throws the following error: "UNKNOWN com.streamsets.pipeline.api.StageException: SPOOLDIR_35 - Spool Directory Runner Failed. Reason java.nio.file.NoSuchFileException: " The…
Carol
  • 347
  • 5
  • 17
1
vote
1 answer

StreamSets Build Failure on project streamsets-datacollector-dist: Unable to find artifact

I'm trying to build StreamSets Data Collector with source codes by following the steps in public Streamsets git repository (SDC Public github link). Following picture is the screenshot of error. Error message I've got when I tried to build…
star7357
  • 11
  • 1
1
vote
2 answers

Can StreamSets be used to fetch data onto a local system?

Our team is exploring options for HDFS to local data fetch. We were suggested about StreamSets and no one in the team has an idea about it. Could anyone help me to understand if this will fit our requirement that is to fetch the data from HDFS onto…
Prakhar Jhudele
  • 955
  • 1
  • 7
  • 14