1

I'm testing platforms that can allow any user to easily create data processing pipelines. This platform has to meet certain requirements and one of them is to be capable of moving data from Oracle/SQL Server to HDFS.

Streamsets Transformer (v3.11) meets all requirements including the one referred above. I just can't get it to work in a very specific case: When ingesting a table that contains no numeric columns.

In these cases I want the pipeline to process all data so, in the JDBC Origin, I enabled the "Skip Offset Tracking" property. I thought that by skipping the offset tracking there would be no need to set the "Offset Column" property (guess I was wrong).

enter image description here

JDBC_05 - Table doesn't have compatible primary key configuration - supporting exactly one column but table have 0

If a numeric column exists, a possible workaround is to set it as the offset column but I can't find a way of doing this when none exists.

Am I missing something?

Thanks

André Machado
  • 726
  • 6
  • 21
  • Spark itself can't handle this situation efficiently. If there is not a numeric column, it cannot partition the data, and hence the best it could do is a single giant partition with all the rows. Is that what you are expecting to happen with Transformer? – Jeff Evans Dec 17 '19 at 17:50
  • @JeffEvans Yes, it is. This is a very particular case in the requirements to deal with some small tables. – André Machado Dec 17 '19 at 17:52

1 Answers1

2

We are looking at providing this functionality in Transformer in a future release. I'll come back and update this answer with any news.

In the meantime, you might want to look at using StreamSets Data Collector for these tables. It does not have the 'numeric offset column' requirement.

metadaddy
  • 4,234
  • 1
  • 22
  • 46