Questions tagged [streamsets]

Use the streamsets tag for questions regarding StreamSets DataOps Platform which includes Data Collector, Transformer and Control Hub.

StreamSets DataOps Platform empowers your whole team, from highly skilled data engineers to visual ETL developers, to do powerful data engineering work. Only StreamSets makes it both simple to get started building pipelines quickly with intent-driven design and easy to extend to meet complex enterprise needs.

Useful Resources:

Initial Release: June 27th, 2014 - StreamSets Data Collector – the First Four Years

Latest Production Release Series:

EBooks:

183 questions
0
votes
1 answer

How to stop cdh's STREAMSETS parcel installing with activing stage

cm api: cluster:7180/ap1/v14/clusters/cluster/parcels/products/STREAMSETS_DATACOLLECTOR/versions/3.2.0.0 reword: { "product" : "STREAMSETS_DATACOLLECTOR", "versions" : "3.2.0.0", "stage" : "ACTIVATING", "state" : { "progress" : 0, …
0
votes
0 answers

Streamsets installation: DataCollector UI port 18630 not opening

I am trying to install Streamsets on a single node Hadoop box (Hortonworks Sandbox). The install process is quite straightforward on the Streamsets website Download the core tar file, untar it and then…
Adeel Hashmi
  • 767
  • 1
  • 8
  • 20
0
votes
0 answers

StreamSets Data Collector : Address already in use

First of all, I am new to Streamsets. I installed the Full Tarball tar, following the instructions for Systemd systems, since I'm working on a Ubuntu 16.04 VM (host is Windows 10). It worked well for a time, but when i restarted my VM, SDC stopeds…
0
votes
1 answer

StreamSets JDBC Producer CDC - Change Log Format Error edit

The idea behind my pipeline is to reflect changes from a MySQL to a PostgreSQL DB. In the future I'll also have a Oracle to PostgreSQL replication. So, from this forum and SDC documentation, I saw that the right way to do it is to use a CDC origin.…
Eilliar
  • 11
  • 4
0
votes
1 answer

Streamsets Mapr FS origin/dest. KerberosPrincipal exception (using hadoop impersonation (in mapr 6.0))

I am trying to do a simple data move from a mapr fs origin to a mapr fs destination (this is not my use case, just doing this simple movement for testing purposes). When trying to validate this pipeline, the error message I see in the staging area…
lampShadesDrifter
  • 3,925
  • 8
  • 40
  • 102
0
votes
1 answer

Can't access non-public directories on local FS in streamsets pipeline creator

New to streamsets. Following the documentation tutorial, was getting FileNotFound: ... HADOOPFS_14 ... (permission denied) error when trying to set the destination location as a local FS directory and preview the pipeline (basically saying either…
lampShadesDrifter
  • 3,925
  • 8
  • 40
  • 102
0
votes
1 answer

Accessing streamsets web UI on another node in a cluster than where installed, which file system does it 'look in'?

I have a cluster of machines hosting hadoop (MapR) and have install streamsets on one of the nodes (say node002) following the RPM documentation. However, I am accessing the web UI for the data collector from another node, node001. My question is,…
lampShadesDrifter
  • 3,925
  • 8
  • 40
  • 102
0
votes
2 answers

SQl query optimzation

*can anyone help me optimizing the query. I am using this query in an ETL called streamsets and it is yielding 70 records for 6 minutes when i run an streamsets pipleline which is very slow.we are taking this query from an SSIS package and joining…
0
votes
1 answer

count of records in a streamsets stage

I use Streamsets to ingest records from oracle to ElasticSearch. I want to register in maprDB destination the count of record that I process each step in my oracle query. How can I get the number of records at a certain streamsets stage?
a.moussa
  • 2,977
  • 7
  • 34
  • 56
0
votes
2 answers

I only have row key inside my mapr-db JSON table

I don't know if it is a common problem with mapr-db JSON. I use a Streamsets destination which is a Mapr-DB JSON table to push records containing 10 columns. I precise the first column to become a row key. when I go to mapr dbshell find…
a.moussa
  • 2,977
  • 7
  • 34
  • 56
0
votes
0 answers

error while installing streamsets data collector in emr. Getting the following error

facing the below issue while installing streamsets in emr after starting sdc we are giving the below command ssh -N -p 22 hadoop@ec2-34-207-150-21.compute-1.amazonaws.com -i /home/hadoop/sterling-emr-keypair.pem -L 18630:localhost:18630 the error…
0
votes
1 answer

NiFi or Streamsets to read from HBase , join with content from flat file and write to Hive

Was trying to figure out if joins can be achieved with apache NiFi or Streamsets. So that i can read from HBase periodically, join with other tables and write few fields into a Hive table. Or is there any other workflow manager tool that supports…
Srihari Karanth
  • 2,067
  • 2
  • 24
  • 34
0
votes
0 answers

Streamsets throws exception (MANUAL_FLUSH buffer) while using Kudu client

I'm a newbie in Streamsets and Kudu technologies and I'm trying several solutions to reach my goal: I've got a folder containing some Avro files and these files need to be processed and afterward sent to a Kudu…
0
votes
2 answers

Streamsets Error - Bad File Descriptor

I was attempting to use Streamsets to query an Oracle database and publish the data into Kafka. I downloaded Streamsets' tarball on my Mac and unzipped it into my home directory. Running $HOME/streamsets-datacollector-2.1.0.2/bin/streamsets dc…
Nathan Loyer
  • 344
  • 3
  • 12
0
votes
0 answers

Issue in fetching the output from local fs via StreamSet tool

I am Exploring StreamSet Tool,I have a log file n , I need to parse the log file to the StreamSet tool,I passed the log file from the Directory to the log parser,the format of the log parser is the Common log format , n the destination is the local…
TEJASHWINI s
  • 25
  • 2
  • 7
1 2 3
12
13