4

Do any of the Spring projects provide a template or utility for inserting into or working with Greenplum?

I understand that one approach is, using Spring Batch, to have a tasklet call the Greenplum gpload utility which will then insert a specified file into the database.

However, give the fact that both the Spring Data and Spring XD projects are aiming to abstract data access and handle big data requirements, it would seem there should be something custom made for this requirement. This is expecially the case given how closely Pivotal are now involved with both GreenPlum and Spring.

If anyone has any experience with working with Spring and Greenplum and can offer any pointers or best practices, it is very much appreciated.

user1052610
  • 4,440
  • 13
  • 50
  • 101

2 Answers2

4

I worked with Spring Batch and Greenpulm. Just use the PostgresSQL jdbc driver it works transparently since Greenplum Database is based on PostgreSQL

Haim Raman
  • 11,508
  • 6
  • 44
  • 70
  • So are you saying that, from a Spring app point of view, you just need to treat Greenplum as a "normal" PostgreSQL DB? – Marco Jan 26 '21 at 15:01
1

This is something that has been coming up several times. There was work done in this area quite a while ago but we haven't moved that code into a public repository - now would be a good time to get this code into github.

Here is a document describing what is available.

https://drive.google.com/file/d/0B2yhsfF9zZ71VTV2bzN5TnpzMGM/edit?usp=sharing

What might not be obvious in there is that (as I recall) we are able to progammatically use gpfdist in Java vs. using the command line. I'll have the author of this take a look at this thread to comment.

This work does not have a a tasklet call the Greenplum gpload utility, but that is certainly a good idea. We have just finished a first pass at a FTP tasklet that will write a file into HDFS, that should be a good basis to start from.

I'm quite interested to hear more of your ideas in this area. One idea is to use a batch partitioned job to that files located on the local file system of an xd-container node can occur in parallel.

Looking forward to your response.

Cheers, Mark

Mark Pollack
  • 166
  • 4