1

My case:

I have some binary data, which has been broken into a lot of parts. Special table FileParts containts fileId, partNo, and data.

I need take all parts together into another table MyFilesStorage.

Now i'm choosing between two realizations:

DO $CODE$
declare
  r record;
begin
  UPDATE public.MyFilesStorage SET mainFileData = E''::bytea WHERE id = 'e14a26c0-db4b-47e1-8b66-e091fb3ba199'::uuid;

  for r in ( select data 
             from public.FileParts 
             where fileId = '89cb8598-436b-49b3-bb1c-34534c6d068e'::uuid
             order by partNo ) loop
    UPDATE public.MyFilesStorage SET mainFileData = mainFileData || r.data WHERE id = 'e14a26c0-db4b-47e1-8b66-e091fb3ba199'::uuid;
  end loop;

end;
$CODE$

I set data as empty, then read parts one by one and append each part into main table.

Another variant:

DO $CODE$
declare
  r record;
  p_result bytea;
begin
  p_result = E''::bytea;

  for r in ( select data 
             from public.FileParts 
             where fileId = '89cb8598-436b-49b3-bb1c-34534c6d068e'::uuid
             order by partNo ) loop
    p_result = p_result || r.data;
  end loop;

  UPDATE public.MyFilesStorage SET mainFileData = p_result WHERE id = 'e14a26c0-db4b-47e1-8b66-e091fb3ba199'::uuid;
end;
$CODE$

Here i use temp variable. Second is much more quick but i dont know wich will take more memory? Is first i need memory to load all file to RAM, and what about first? Will postgre loads all content here: mainFileData = mainFileData || r.data?

Maybe there is another way to do this, becouse both variants a veeeeery slow? In oracle i use DBMS_LOB.APPEND for this operation.

vyegorov
  • 21,787
  • 7
  • 59
  • 73
Yavanosta
  • 1,480
  • 3
  • 18
  • 27
  • 2
    You could use PostgreSQL large objects feature, in which case you could do a fast append by seeking to the end and writing. http://www.postgresql.org/docs/9.1/interactive/lo-interfaces.html#LO-SEEK That has some limitations compared to bytea, so look it over carefully. On the other hand, what you are doing with the file parts is remarkably similar to what PostgreSQL does internally through it's TOAST system, except that it also attempts compression by default. http://www.postgresql.org/docs/9.1/interactive/storage-toast.html This is totally automatic and transparent. – kgrittn Apr 17 '12 at 17:26
  • Thanks a lot. It's very usefull advice! I'l try LO interface. – Yavanosta Apr 18 '12 at 11:31

2 Answers2

3

Your approach looks correct, check PostgreSQL manual here.

You can also define your own aggregate to do the job for you:

CREATE AGGREGATE bytea_add(bytea)
(
    sfunc = byteacat,
    stype = bytea,
    initcond = E''
);

And the use a common SQL, like this:

UPDATE public.MyFIlesStorage SET mainFileData = (
    SELECT bytea_add(data) FROM public.FileParts
     WHERE fileId = '89cb8598-436b-49b3-bb1c-34534c6d068e'::uuid
     -- ORDER BY partNo -- ORDER BY partNo will not work
 )
 WHERE id = 'e14a26c0-db4b-47e1-8b66-e091fb3ba199'::uuid;

EDIT:

UPDATE public.MyFilesStorage mfs SET mainFileData = fp.all_data
  FROM (
    SELECT bytea_add(data) OVER (ORDER BY partNo) AS all_data,
           rank() OVER (ORDER BY partNo DeSC) AS pos
      FROM public.FileParts
     WHERE fileId = '89cb8598-436b-49b3-bb1c-34534c6d068e'::uuid
 ) AS fp
 WHERE fp.pos = 1
   AND mfs.id = 'e14a26c0-db4b-47e1-8b66-e091fb3ba199'::uuid;

You can check the output of the inner SQL separately.

vyegorov
  • 21,787
  • 7
  • 59
  • 73
  • Thanks a lot for your answer. How can i garantee that `bytea_add` will aggregate parts in right orger if Order by will not work? – Yavanosta Apr 18 '12 at 10:39
  • @Yavanosta, from PostgreSQL 8.4 it is possible to use `bytea_add(data ORDER BY partNo)` to precise the order. – vyegorov Jul 09 '12 at 18:05
3

The first version is slower because PostgreSQL doesn't do in-place updates at the storage level, it creates a new version of the row for each UPDATE. So for a row that would go from 0Mb to 100MB by 10Mb increments, what will really be written to disk is not 10x10Mb but rather: 10Mb+20Mb+30Mb+...+90Mb+100Mb = 550Mb. On the other hand, the memory consumption will remain low with no more than 10Mb allocated in memory at a time.

The second version is faster with only 100Mb to write, but it needs to allocate 100Mb in memory.

The structure of the FileParts table with ordered chunks is generally easier to manage for large contents, why bother to convert it into the monolithic other structure?

Daniel Vérité
  • 58,074
  • 15
  • 129
  • 156
  • It's a part of big system, and we are migrating to PostgreSQL. So i can't change this behavior. I must say that all other databases (Oracle, Sql server) has special instrumets to manipulate large binary objects and store them outside of main table data file. But postgre hasn't and it's very pity. – Yavanosta Apr 18 '12 at 10:43
  • 1
    I disagree. Pg's BYTEA type should be compared to Oracle's LONG RAW. Oracle's BLOB type and its DBMS_LOB package should be compared/replaced with Pg's large object (Oid column as a "locator") with it's lo_* functions. – Daniel Vérité Apr 18 '12 at 14:09
  • Yes you are right, but there is no way to comunicate with LargeObject throw SQL. For example i can access to lo_size only using call from driver, and i can't just do `SELECT lo_size(111)` If i'm wrong, than it will be very nice. – Yavanosta Apr 18 '12 at 14:12
  • 1
    See this recent question: http://stackoverflow.com/questions/9248738/obtaining-the-size-of-lobject-in-postgressql – Daniel Vérité Apr 18 '12 at 14:15