1

SSIS Package performs the ETL on a remote server (Greenplum envt). It runs fine but takes 8+ hours to complete. Data on the Remote server's interaction tables are massive (~ 1 Billion rows each). Is there a way or any option available on SSIS specifically for the massive amount of data?

Remote Server: Data Lake (Greeplum) PS: I cannot schedule my query on Data lake itself due to the company policy, but if run the same script on Data lake manually it takes approx 1hr 20 mins to complete the Job.

Thank you!

CleanBold
  • 1,551
  • 1
  • 14
  • 37

1 Answers1

0

How does SSIS perform ETL? does it run insert into .. values ...? if so, the performance is expected to be bad because the insert overhead is high. there are several parameters that could help(reference https://greenplum.org/oltp-workload-performance-improvement-in-greenplum-6/):

  • gp_enable_global_deadlock_detector
  • checkpoint_segments

However, the suggested why to do ETL is through gpload/gpfdist (or gpss)

Li Yang
  • 26
  • 1
  • Thank you!, I have been with the given link and it talks about mostly on the server itself. I am using SSIS because I cannot have a service agent role due to company policy, So for me, it is all remote operation It is not using insert into command, On the source, I am running a query on Table A and using Data flow I am storing the result into a table B. – kashif ashraf Jun 27 '22 at 11:11