0

Currently the jdbchdfs job does not have partitionPath for output directory, like the hdfs sink. What is the recommended way for doing it? i don't also see any JIRA for this, any plan for such a thing be supported in future?

I gave it a thought and concluded that it could create multiple files for multiple executing partitions. However for large data set loads, we would like to break our output to multiple directories based on the values in the data.

If I wanted to create such a job, how do I reuse the out of the box partition strategy which is used in the HDFS sink? Any pointers will be appreciated.

Ali
  • 69
  • 8

1 Answers1

0

The current jdbchdfs job uses a very simple ItemWriter implementation. It should changed to use a Spring Hadoop DataWriter implementation and looks straightfoward to make the improvement. I created the JIRA https://jira.spring.io/browse/XD-2822 to keep track of this improved functionality with a tip on the implementation approach. If you can try it out and issue a PR that would be much appreciated.

Cheers, Mark

Mark Pollack
  • 166
  • 4