Questions tagged [streamsets]

Use the streamsets tag for questions regarding StreamSets DataOps Platform which includes Data Collector, Transformer and Control Hub.

StreamSets DataOps Platform empowers your whole team, from highly skilled data engineers to visual ETL developers, to do powerful data engineering work. Only StreamSets makes it both simple to get started building pipelines quickly with intent-driven design and easy to extend to meet complex enterprise needs.

Useful Resources:

Initial Release: June 27th, 2014 - StreamSets Data Collector – the First Four Years

Latest Production Release Series:

EBooks:

183 questions
1
vote
1 answer

Streamsets Solr destination module error is not letting me add data to Solr collection directly from pipeline

I have built and deployed the following docker-compose.yml file: services: solr1: container_name: solr1 image: solr:5-slim ports: - "9981:9983" - "8981:8983" volumes: - data:/var/solr -…
statsguy
  • 123
  • 1
  • 12
1
vote
0 answers

Having trouble deploying Streamset Data Collector on Azure

I'm am attempting to deploy the Streamset Data Collector for Azure on Azure (obviously!) and it consistently fails with a rather mysterious OSProvisioningInternalError after the VM has been provisioned. Everything is deployed but it looks like the…
1
vote
2 answers

How to join multiple Kafka topics in StreamSets Data Collector?

I have a use case where I have to "join" multiple Kafka topics based on some criteria in StreamSets Data Collector. I wonder if there is some commonly adopted idiom that could solve such a problem?
Gill Bates
  • 14,330
  • 23
  • 70
  • 138
1
vote
1 answer

Writing data to AWS Aurora using StreamSets

I have got one requirement where we have to write real-time data to AWS Aurora (PostgreSQL) using StreamSets Data Collector. I have never worked on StreamSets but I have learn that it's a data connector. I tried to search to get something on this…
AWS_Lernar
  • 627
  • 2
  • 9
  • 26
1
vote
1 answer

Using JDBC Meta Data Processor of StreamSets 3.8 in StreamSets Version 2.5

There's a need for my team to use old version of StreamSet, version 2.5. But there is some important processors in version 3.8 we want to include in the old environment, namely the JDBC Meta Data Processor. What have been done is below: Create a…
Felix
  • 269
  • 1
  • 11
1
vote
1 answer

Sample Spark Evaluator code for Streamsets

I am trying to write a spark evaluator in Streamsets. I have to deal with complex SQL queries and hence would want to use data frames or datasets here. But the sample code which Streamsets provides deals with JavaRDD only. Can I have an insight on…
earl
  • 738
  • 1
  • 17
  • 38
1
vote
0 answers

Streamsets displays an error writing to HDFS

I am running a data collector (3.10.0), connected to Control Hub (3.8?). All on-premises. While trying to run a pipeline, I get the following error. The pipeline takes a local file and uploads it to hdfs. "Pipeline status: RUNNING_ERROR:…
Chompers
  • 11
  • 2
1
vote
1 answer

Invoke SOAP API using StreamSets on Cloudera

What is the way to invoke the SOAP API on streamsets and how to pass the WSDL to it? What are the boxes needed to do that ?
earl
  • 738
  • 1
  • 17
  • 38
1
vote
2 answers

How to get StreamSets Record Fields Type inside Jython Evaluator

I have a StreamSets pipeline, where I read from a remote SQL Server database using JDBC component as an origin and put the data into a Hive and a Kudu Data Lake. I'm facing some issues with the type Binary Columns, as there is no Binary type support…
Matrix
  • 1,810
  • 1
  • 19
  • 20
1
vote
1 answer

Get parameters from Rest Http url for get method using streamsets microservice pipeline

I have created a microservice pipeline in streamsets. Upon making a get callout, i have to retrieve data from mysql depending on the parameters sent in the http get url using expression evaluator? My url is supposed to be like this:…
Luiguixx09
  • 11
  • 1
1
vote
1 answer

Is it possible to use Record Fields as URL params in HTTP Client Processor in StreamSets?

I'm new to StreamSets, thanks in advance for any help. In my pipeline records (JSON) I have a field with Geo-coordinates (lat, lon) and I'm trying to add more meta-data to them. I'm wondering if it's possible to use a HTTP Client Processor to…
1
vote
1 answer

HADOOPFS - Could not verify the base directory in streamsets

I am having issues running the Pipeline with in streamsets, I can see the following error is : HADOOPFS_44 - Could not verify the base directory: 'java.net.ConnectException: Call From SDC/...... to ......failed on connection exception:…
1
vote
1 answer

Streamsets gives this error trying to parse a valid JSON

I am setting up streamsets for a project.It has Kafka consumer as its origin. It was working fine for smaller messages but when the message size is larger it throws this error. com.fasterxml.jackson.core.io.JsonEOFException: Unexpected…
1
vote
1 answer

AttributeError: 'module' object has no attribute '_Condition' in the script

I am trying to access AWS S3 object using boto3 from python. I have given the AWS credentials. However the place i use the boto3 API boto3.client('S3') to access S3 resource it is throwing attribute error. Below is the code snippet: import…
rahul raj
  • 11
  • 3
1
vote
1 answer

Airflow should integrate with NiFi/StreamSets?

I know Airflow is called workflow manager, nifi dataflow manager, but what this means exactly? The best explanation so far was that nifi cares about data while airflow cares about tasks, but I don't quite get this definition, and I couldn't find any…
set92
  • 322
  • 4
  • 13