0

If I specify multiple URIs for LOCATION in CREATE EXTERNAL TABLE in greenplum database, will it load the files in parallel? or does it make no difference loading the whole file versus splitting the files into multiple files and loading them instead?
Official Doc

Deepak K M
  • 521
  • 1
  • 5
  • 13

2 Answers2

1

Files are loaded in sequence, per command. If you specify multiple files, eg: gpfdist://data/file_*

Then all of those files will be loaded in sequence, concurrently by all the segments.

You can achieve faster concurrent loading by splitting the files over multiple gpfdist instances.

eg: gpfdist://data/file_part_1* gpfdist://data/file_part_2*

For a video example, see: https://youtu.be/QqzUhTgWPZg?t=4m48s

0

Multiple gpfdist instance will load/unload data on defined location in parallel fashion. That is the real use of greenplum db