Questions tagged [streamsets]

Use the streamsets tag for questions regarding StreamSets DataOps Platform which includes Data Collector, Transformer and Control Hub.

StreamSets DataOps Platform empowers your whole team, from highly skilled data engineers to visual ETL developers, to do powerful data engineering work. Only StreamSets makes it both simple to get started building pipelines quickly with intent-driven design and easy to extend to meet complex enterprise needs.

Useful Resources:

Initial Release: June 27th, 2014 - StreamSets Data Collector – the First Four Years

Latest Production Release Series:

EBooks:

183 questions
0
votes
3 answers

Reset Origin of a StreamSets Pipeline using another pipeline

I want to reset the origin of a StreamSets pipeline, using another pipeline. I made a pipeline that sends 1 useless record to HTTP client component. The HTTP client contains the RESTFUL URL to reset the origin of a pipeline. It's something like…
Matrix
  • 1,810
  • 1
  • 19
  • 20
0
votes
1 answer

java.lang.ClassCastException: org.apache.spark.sql.catalyst.expressions.GenericRowWithSchema cannot be case to com.streamsets.pipeline.api.Record

I am trying to run a sample application locally using: Scala (2.11), Spark(2.3.0) with streamset api version 3.8.0. (I am trying to run a spark transformation as described in this tutorial:…
blueberret
  • 21
  • 1
  • 5
0
votes
1 answer

Run curl command to load data to hdfs via python/jython

Unable to download the file to hdfs if the url contains spaces when we are executing the jython/python For example : URL Contains spaces in the file name and directory path > http://www.example.com/a bc/def/c h.csv Command i tried with the url by…
user1485267
  • 1,295
  • 2
  • 10
  • 19
0
votes
0 answers

delete records from the file using jython

How to delete n lines at top=2 and bottom=1 using jython which is saved in sample.txt. My filesize might be MB/GB. sample.txt contains below lines 1,a 2,b 3,c 4,d 5,e 6,f Expected Output : 3,c 4,d 5,e
user1485267
  • 1,295
  • 2
  • 10
  • 19
0
votes
1 answer

ERROR: com.streamsets.pipeline.api.StageException: JDBC_52 - Error starting LogMiner

I am getting the following error while running oracle cdc since today morning it was running fine but get continues errors from this morning. What is the exact reason for this error? The pipeline, cdc_test stopped at 2019-06-15 13:37:46 due to the…
0
votes
1 answer

streamset data collector container not able to read file from windows directory(d:/file)

I have created a container using docker and have that container running on localhost. All I want to do is to pick up an excel file from D:/file/ directory but when I enter such directory in Files Directory I get the error as no such directory exist.…
0
votes
0 answers

Streamset Data Collector and HDP

We are trying to build a pipeline for reading data from JDBC (Source) and Hive Metastore (Destination) in settings General Tab ==> Stage Library we choose Hive 2.1-HDP 2.6.2 1-1 (as it don't have VERSION Matching to our) We have below…
0
votes
2 answers

StreamSets get MongoDB fields

I would like to ask if anyone knows if StreamSets can also get a field which does not exist in every MongoDB records. Thanks in advance.
0
votes
1 answer

Streamsets: Neo4j query very slow

I am working in a Streamsets pipeline to read data from a active file directory where .csv files are uploaded remotely and put those data in a neo4j database. The steps I have used is- Creating a observation node for each row in .csv Creating a csv…
0
votes
1 answer

File is not Loaded Into HDFS from Local Using Streamsets (validated Successfully!)

I just have started using streamsets, and i'm trying to load a text file from local to HDFS. Please note: I'm using Cloudera Manager, here is a view of "core-site.xml": hadoop.ssl.server.conf ssl-server.xml
El Mehdi OUAFIQ
  • 152
  • 1
  • 13
0
votes
2 answers

JDBC Producer in streamsets that could not write data into MySql

I had configured the JDBC connection configuration in the pipeline. and when the application executes i get the following error on the logs. "java.sql.SQLSyntaxErrorException: Table 'databaseName.aim_table' doesn't exist" The databaseName is not…
Harley
  • 1
  • 2
0
votes
1 answer

Unable to install Streamsets in mac

i am trying to install Streamsets in my mac. while i try to start the streamsets by this command: streamsets-datacollector-3.4.3/bin/streamsets dc getting following exception: Abnormal exit: java.lang.RuntimeException: The permissions of the realm…
user6325753
  • 585
  • 4
  • 10
  • 33
0
votes
1 answer

Unable to write data from StreamSets Jython Evaluator

I am trying to read data from directory and trying to parse that data and finally trying to write it to another directory. for this i am using Jython Evaluator. Here is my code: import sys sys.path.append('/usr/lib/python2.7/site-packages') import…
ROOT
  • 1,757
  • 4
  • 34
  • 60
0
votes
2 answers

Multiple data collector for a job without duplicating records in streamsets

I have a directory consist of multiple files, and that is shared across multiple data collectors. I have a job to process those files and put it in the destination. Because the records are huge, I want to run the job in multiple data collector. but…
Tamizharasan
  • 293
  • 1
  • 5
  • 18
0
votes
0 answers

Streamsets Service does not start any more

I just upgraded my MapR Cluster and I am trying to start Streamsets. However I get the following Error: Exception in thread "main" java.lang.ExceptionInInitializerError: Expected exactly 1 stage lib jar but found 0 with name…
Paul Plato
  • 1,471
  • 6
  • 28
  • 36