If I specify multiple URIs for LOCATION
in CREATE EXTERNAL TABLE
in greenplum database, will it load the files in parallel? or does it make no difference loading the whole file versus splitting the files into multiple files and loading them instead?
Official Doc
Asked
Active
Viewed 245 times
0

Deepak K M
- 521
- 1
- 5
- 13
2 Answers
1
Files are loaded in sequence, per command. If you specify multiple files, eg: gpfdist://data/file_*
Then all of those files will be loaded in sequence, concurrently by all the segments.
You can achieve faster concurrent loading by splitting the files over multiple gpfdist instances.
eg: gpfdist://data/file_part_1* gpfdist://data/file_part_2*
For a video example, see: https://youtu.be/QqzUhTgWPZg?t=4m48s

Brendan Stephens
- 227
- 1
- 5
-
Does it mean that if i run gpfdist service on 4 different folders and distribute my files to those 4 folders, those files will load in parallel? – Deepak K M Jun 06 '18 at 06:28
-
Yes, that's what it means. – A. Scherbaum Jun 07 '18 at 11:54
0
Multiple gpfdist instance will load/unload data on defined location in parallel fashion. That is the real use of greenplum db

Gurupreet Singh Bhatia
- 708
- 6
- 17