Questions tagged [streamsets]

Use the streamsets tag for questions regarding StreamSets DataOps Platform which includes Data Collector, Transformer and Control Hub.

StreamSets DataOps Platform empowers your whole team, from highly skilled data engineers to visual ETL developers, to do powerful data engineering work. Only StreamSets makes it both simple to get started building pipelines quickly with intent-driven design and easy to extend to meet complex enterprise needs.

Useful Resources:

Initial Release: June 27th, 2014 - StreamSets Data Collector – the First Four Years

Latest Production Release Series:

EBooks:

183 questions
1
vote
1 answer

Posting data to SDC HTTP server url from JavaScript shows CORS issue

I am using Streamsets pipeline for streaming data from browsers. So for that, I created a pipeline with HTTP server origin to post data from browsers Javascript and tried to write in that URL using REST client, and it writes successfully. Response…
Nishu Tayal
  • 20,106
  • 8
  • 49
  • 101
1
vote
2 answers

Streamsets stream selector

I have a queue in JSON format in RabbitMQ and I would like to get some data that fix some conditions in StreamSets (using stream selector) and then save in a new database (JDBC Producer) a certain value. How do I write the specific value after the…
Areizaga
  • 11
  • 3
1
vote
2 answers

how to manually download and install streamsets data collector destination packages

Does anyone know of a way to download and install destination packages for Streamsets Data Collector. My SDC does not have access to the internet hence why I cannot do it the standard panel way. I specifically want to download the kafka package and…
bytebiscuit
  • 3,446
  • 10
  • 33
  • 53
1
vote
1 answer

Logstash Origin connector for StreamSets

Is it possible to build directly a pipeline using Logstash as origin and Cassandra as destination. If not, how could be the best way to do it?
Maximilien Belinga
  • 3,076
  • 2
  • 25
  • 39
0
votes
0 answers

Multiple filters - file extension

I have an use case where I should search for files in sharepoint library and filter files that start with ABC and the file type as CSV. I have the below GET command but I am not sure how to add another filter that picks only the CSV…
Anand2706
  • 57
  • 6
0
votes
1 answer

Streamsets - Microsoft graph api - Resource url to pick the latest file from the sharepoint document library

I have a sharepoint site that has csv files in the document library pages. I need to extract the latest modified file for each run based on the last run datetime. So for example, my last run datetime is '2023-07-20T22:50:10Z', then I need to get the…
Anand2706
  • 57
  • 6
0
votes
0 answers

How to get the pipeline names from streamsets using python sdk

How can list the pipeline names those are available in streamsets using python sdk. We are using id and token generally to connect streamsets and not user name and passoword i found below referenece but not sure how to get those parameter to be…
Hari
  • 89
  • 4
0
votes
0 answers

StreamSets HTTP pagination Link field does not exist in record

I'm using StreaMSets Data Collector to download from a Microsoft dataverse API, which uses pagination, supplying a next-page link in the record. I'm using an HTTP Processor stage with Pagination = Link in Response Field. It works fine when there are…
Keith H
  • 15
  • 5
0
votes
0 answers

Streamsets getting all field names / column names into a single string, Values all into a string, and only separate them by a comma

I have a file that I am consuming into StreamSets and in that I have the following sample: Source_id: {String} "1234" Partition_id: {String} "ABC" Key: {String} "W3E" (the field names are dynamic, sometimes it changes so we can't hardcode those…
MichelleNZ
  • 45
  • 3
0
votes
0 answers

How to create stage in streamset for redshift using python sdk?

I have been looking for an example or solution for creating a stage in streamset using python sdk. i saw many examples for services such as sql, s3 etc but not specifically for redshift #Expected i need to create a pipeline to load data from s3 to…
Hari
  • 89
  • 4
0
votes
1 answer

Send event message to AWS MSK in Streamsets

I have a Streamsets pipeline that sends events to AWS SNS currently (Using HTTP client). I now have a requirement to send these events to AWS MSK instead of AWS SNS. I am not finding a relevant documentation to start with this. Don't know where and…
Mike
  • 721
  • 1
  • 17
  • 44
0
votes
0 answers

Snowflake PUT command silently failing from StreamSets when in code block

I have a StreamSets pipeline putting some files into internal Snowflake stage. I am using Snowflake Execute component instead of SnowFlake File Uploader as I need to conditionally execute the PUT. The PUT command on its own works, but if PUT is…
0
votes
2 answers

Streamsets job logs to S3

I am looking for solution where we can store all logs (Info, Debug etc) of a Streamsets pipeline (Job) to S3 buckets ? Currently logs are only available at log console of Streamsets UI only
Yogesh
  • 11
  • 1
0
votes
1 answer

Write to streamsets sdc in groovy scripting

I am pretty new to Streamsets and Groovy and I am updating my “Dev Raw Data Source” to “Groovy Scripting” as my input. The issue which I face here is, I have the below code in my “Groovy Scripting” and it throws an error. Groovy scripting: import…
Mike
  • 721
  • 1
  • 17
  • 44
0
votes
1 answer

I'm trying to create and pass a new record which contains a map from the jython processor in streamsets but getting this error?

I want the newRecord to containt a map of column names and column values. I am getting the following error which I am not able to resolve - Record1-Error Record1 SCRIPTING_04 - Script sent record to error: write(): 1st arg can't be coerced to…