Questions tagged [streamsets]

Use the streamsets tag for questions regarding StreamSets DataOps Platform which includes Data Collector, Transformer and Control Hub.

StreamSets DataOps Platform empowers your whole team, from highly skilled data engineers to visual ETL developers, to do powerful data engineering work. Only StreamSets makes it both simple to get started building pipelines quickly with intent-driven design and easy to extend to meet complex enterprise needs.

Useful Resources:

Initial Release: June 27th, 2014 - StreamSets Data Collector – the First Four Years

Latest Production Release Series:

EBooks:

183 questions
2
votes
1 answer

Directory Origin for Streamsets -- need only the filename to pass

I am trying to build a pipeline in StreamSets wherein when a file comes to a directory i want to invoke a rest api with just the file name; I don't want StreamSets to read the file or do any processing on it. But whatever I try, it's trying to send…
2
votes
0 answers

How to use encrypt stage and provide KMS keys for encryption?

I wanted to encrypt dob field. So I am trying to use the encrypt stage in streamsets. I pass the secret key id, secret key and KMS ARN. When I try to validate the pipeline, I get following error: 2019-05-14 16:52:36,492…
Mani
  • 41
  • 2
2
votes
1 answer

Groovy script for streamsets to parse string of about 1500 characters

This is for streamsets, I am trying to write groovy script. I have string of length 1500 chars. No delimiter. The pattern is first 4 characters are some code, next 4 characters are length of word followed by the word. Again it as 4 chars of some…
Mani
  • 41
  • 2
2
votes
1 answer

Getting the writable operation error while using oracle cdc

I am getting the following error while connecting through oracle cdc client and my origin database is read only database but the error is database required for writable operation. please help Caused by: java.sql.SQLException: ORA-01300: writable…
Kum
  • 31
  • 5
2
votes
1 answer

Regex in Streamsets

Hi I want to break a log file using Streamsets. the log is like, Deny tcp src dmz:77.77.77.7/61112 dst dmz:55.55.56.57/139 by access-group "outside_access_in" [0x8b3ecfdc, 0x0] There may be more than 2 IP's also in the log and I'm trying to capture…
2
votes
1 answer

Streamsets version from CLI

I'm currently writing code that locally installs streamsets extensions via a CLI. One of the checks I want to write is to ensure that the extension works for the streamsets version that's installed locally. When I try to query the version from the…
AlexLordThorsen
  • 8,057
  • 5
  • 48
  • 103
2
votes
1 answer

Strange Behavior With Streamsets SQL Server Change Tracking Origin

I am trying to use the SQL Change Tracking Origin to Create data ingestion pipeline. I have connected the origin and specified all the neccessary jdbc parameters and the pipeline validates successfully. However on running the pipeline, I get the…
Paul Plato
  • 1,471
  • 6
  • 28
  • 36
2
votes
2 answers

StreamSets HTTP Client

I'm working with StreamSets on a Cloudera Distribution, trying to ingest some data from this website http://files.data.gouv.fr/sirene/ I've encountered some issues choosing the parameters of both the HTTP Client and the Hadoop FS…
2
votes
3 answers

how to connect from oracle origin with streamsets

i want to create an origin source from oracle. so I choos as origin oracle cdc. then I configured each parameter: Schema Name Table Username Password JDBC Connection String but when I run the process, i find into my log: 2017-08-22…
a.moussa
  • 2,977
  • 7
  • 34
  • 56
2
votes
2 answers

I can't execute sudo streamsets dc to start streamsets

when I try to run: sudo streamsets dc I get the following error WARN: could not determine Java environment version; expected 1.8, which are the supported versions WARN: Security is enabled and was unable to verify policy file…
a.moussa
  • 2,977
  • 7
  • 34
  • 56
2
votes
1 answer

Streamsets code behind

I am interested to work on Streamsets. However, I would like to integrate into my codes not working on UI. How they have been written, Can I access the codes behind Directory and file tail. If they are using Spark streaming behind or other…
Mehdi
  • 133
  • 1
  • 12
1
vote
1 answer

How to parse a string from the input file name in streamsets

I need to extract a string from the input file and add it as a field in the record. For example, if my file has a date in the filename, only the date needs to be extracted and added as an additional column in the record. If the file name is like…
Rainbow
  • 11
  • 1
1
vote
1 answer

I am using regex in stream selector in streamsets but it is not working as expected

I am trying to partition data on wether it matches a regex or not. The column contains the value 00000000081.48 and the expression is str:matches(record:value('/OUT_HD_CURR_BAL'), '[0-9]+\.[0-9]+') But it is behaving as if the output is false. Is…
1
vote
1 answer

Add text annotation to SDC pipeline?

Is it possible to add a text boxes (/annotations) to an SDC pipeline (v3.8.2)? I want to be able to write a note describing what a few different parts of the pipeline are doing.
eze
  • 2,332
  • 3
  • 19
  • 30
1
vote
0 answers

How to set the initial offset of mongodb on nifi

NIFI GetMongo With mongodb's_ ID is used as an offset to synchronize mongodb data, but no place can be set on the getmongo or getmongodrecord processor of nifi ??? Analogy: on streamsets, offset field and initial offset can be set
marcus.liu
  • 11
  • 2
1
2
3
12 13