Questions tagged [streamsets]

Use the streamsets tag for questions regarding StreamSets DataOps Platform which includes Data Collector, Transformer and Control Hub.

StreamSets DataOps Platform empowers your whole team, from highly skilled data engineers to visual ETL developers, to do powerful data engineering work. Only StreamSets makes it both simple to get started building pipelines quickly with intent-driven design and easy to extend to meet complex enterprise needs.

Useful Resources:

Initial Release: June 27th, 2014 - StreamSets Data Collector – the First Four Years

Latest Production Release Series:

EBooks:

183 questions
1
vote
1 answer

How to access facebook API without creating facebook App

I'm currently researching on how to use facebook API and collect its data thru streamsets and store it to S3. But facebook requires developer to create an app and verify it which somehow not applicable to what I'm doing right now. Is there other way…
1
vote
1 answer

Implement SSL connection in JDBC producer

I was wondering if there's any way to implement SSL connection for JDBC producer in StreamSets, I've been looking on the net and docs, but couldn't find any info about it and the task doesn't have a TLS tab to configure it. If it helps I'm using…
Cutu
  • 11
  • 3
1
vote
1 answer

How to transition a StreamSets pipeline to Finished state if the Origin does not 'Produce Events'?

I have created a StreamSets pipeline where the Origin is 'Kafka Consumer' and the destination is 'JDBC Producer'. To run this pipeline, I have created a StreamSets Job. After I click on 'Start Job' to run the pipeline, the Job status turns to…
Shreya
  • 13
  • 3
1
vote
1 answer

Unable to access StreamSets through URL on K8s

I'm using ansible script to deploy streamsets on k8s master node. There is play where I'm checking if the streamset dashboard is accessible via http://127.0.0.1:{{streamsets_nodePort}} where streamsets_nodePort: 30029. The default port is 30024,…
1
vote
2 answers

geo_point mapping python and StreamSets fails with Elasticsearch

I have this mapping in elasticsearch "mappings": { "properties": { "fromCoordinates": {"type": "geo_point"}, "toCoordinates": {"type": "geo_point"}, "seenCoordinates": {"type": "geo_point"}, …
drules
  • 27
  • 1
  • 4
1
vote
1 answer

StreamSets processor cant find the Redis libs

I have a two containers with Redis and StreamSets. I want to write custom processor in Java and put to pipeline. But when I add code from tutorial to processor, send jar to lib and try to start I got an exceptions. Could you help me please?…
Vadim
  • 753
  • 8
  • 22
1
vote
1 answer

Distributed execution in StreamSets

I want to understand how to work StreamSets Data Collector. What's happen when the Streamsets pipeline is executed? Does it have a distributed execution and master and worker processes? Which components response for master and worker processes? And…
Vadim
  • 753
  • 8
  • 22
1
vote
1 answer

Not able to insert data into redshift table if any column has any NULL values from s3

I am having source data in s3 in below format. WM_ID,SOURCE_SYSTEM,DB_ID,JOB_NUM,NOTE_TYPE,NOTE_TEXT,NOTE_DATE_TIME WOR25,CORE,NI,NI1LBE14,GEN,"",2020-02-01 17:23:32 WOR25,FSI,NI,NI1LBR39,CPN,"",2020-02-04…
anidev711
  • 232
  • 4
  • 15
1
vote
1 answer

StreamSets data not landing into table created on postgres db

I am using StreamSets to build a pipeline to land data from a table that sits in a sqlserver db to a table on postgres db. JDBC Query Consumer --> Timestamp --> JDBC Producer The pipeline passes validation checks and runs successfully on preview…
Shoaib Maroof
  • 369
  • 1
  • 3
  • 13
1
vote
2 answers

Avoiding key duplicates when merging data sets into single table

I am trying to land our asset data from various countries (e.g. Spain, Sweden for now) into 1 table using StreamSets. Considering that they both will have the same identity key, i.e. Spain will have a panel_ID = 1 and so will Sweden. To make my…
Shoaib Maroof
  • 369
  • 1
  • 3
  • 13
1
vote
1 answer

Not able to dump JSON response to a FTP server in a CSV file using streamsets

I created a pipeline with HTTP client > Field Pivoter > Field Flattenner > SFTP/FTP/FTPS Client I am simply trying to fetch data from a HTTP API which returns JSON and dump its response to a FTP server in a CSV file. When I am trying to preview it…
arjun
  • 1,594
  • 16
  • 33
1
vote
1 answer

Streamsets Transformer - JDBC Origin without offset column

I'm testing platforms that can allow any user to easily create data processing pipelines. This platform has to meet certain requirements and one of them is to be capable of moving data from Oracle/SQL Server to HDFS. Streamsets Transformer (v3.11)…
André Machado
  • 726
  • 6
  • 21
1
vote
1 answer

Target file name using file name expression in streamsets

I am trying to load the data from Google cloud-bucket to Local file system using below: My Origin (Google Cloud Storage) Properties: Common Prefix = /Target_Files/2019_12_02_Part1/ Prefix Pattern = SDC_11643212-5147-49ba-92e8-ba0308679000 My…
Moushmi
  • 51
  • 9
1
vote
1 answer

Not able to read data from Google Cloud Platform in StreamSets Data Collector

I am trying to create a pipeline in StreamSets Data Collector to read data from a Google Cloud Platform bucket and load the data into the same bucket with a different file name. The data file in the bucket is in JSON form. I used the Google Cloud…
Moushmi
  • 51
  • 9
1
vote
1 answer

StreamSets - How to bind variable for oracle jdbc producer

We are having a requirement to use oracle bind variable to run any queries onto oracle Database, by that it will do soft parsing rather than hard parsing to improve the performance. I have checked but didn't find anything , Please help if it is…
1 2
3
12 13