20

I would like to obtain the byte size of a blob.

I am using Postgresql and would like to obtain the size using an SQL query. Something like this:

SELECT sizeof(field) FROM table;

Is this possible in Postgresql?

Update: I have read the postgresql manual and could not find an appropriate function to calculate the file size. Also, the blob is stored as a large object.

Bob
  • 457
  • 1
  • 5
  • 15
  • Please read the manual before posting such a question: http://www.postgresql.org/docs/current/static/functions.html –  Apr 16 '12 at 06:51
  • 1
    @DanielVérité: it does look to be a dupe, but in fairness, I couldn't find that question when I searched before posting my answer. Who calls them "lobjects", honestly ? ;-) My function is uncannily like yours from that question, though in my defence, if I'd copied it I would have copied the error handling too! – Edmund Apr 17 '12 at 03:29

6 Answers6

22

Not that I've used large objects, but looking at the docs: http://www.postgresql.org/docs/current/interactive/lo-interfaces.html#LO-TELL

I think you have to use the same technique as some file system APIs require: seek to the end, then tell the position. PostgreSQL has SQL functions that appear to wrap the internal C functions. I couldn't find much documentation, but this worked:

CREATE OR REPLACE FUNCTION get_lo_size(oid) RETURNS bigint
VOLATILE STRICT
LANGUAGE 'plpgsql'
AS $$
DECLARE
    fd integer;
    sz bigint;
BEGIN
    -- Open the LO; N.B. it needs to be in a transaction otherwise it will close immediately.
    -- Luckily a function invocation makes its own transaction if necessary.
    -- The mode x'40000'::int corresponds to the PostgreSQL LO mode INV_READ = 0x40000.
    fd := lo_open($1, x'40000'::int);
    -- Seek to the end.  2 = SEEK_END.
    PERFORM lo_lseek(fd, 0, 2);
    -- Fetch the current file position; since we're at the end, this is the size.
    sz := lo_tell(fd);
    -- Remember to close it, since the function may be called as part of a larger transaction.
    PERFORM lo_close(fd);
    -- Return the size.
    RETURN sz;
END;
$$; 

Testing it:

-- Make a new LO, returns an OID e.g. 1234567
SELECT lo_create(0);

-- Populate it with data somehow
...

-- Get the length.
SELECT get_lo_size(1234567);

It seems the LO functionality is designed to be used mostly through the client or through low-level server programming, but at least they've provided some SQL visible functions for it, which makes the above possible. I did a query for SELECT relname FROM pg_proc where relname LIKE 'lo%' to get myself started. Vague memories of C programming and a bit of research for the mode x'40000'::int and SEEK_END = 2 value were needed for the rest!

Edmund
  • 10,533
  • 3
  • 39
  • 57
22

You could change your application to store the size when you create the large object. Otherwise you can use a query such as:

select sum(length(lo.data)) from pg_largeobject lo
where lo.loid=XXXXXX

You can use also the large object API functions, as suggested in a previous post, they work ok, but are an order of magnitude slower than the select method suggested above.

Blazemonger
  • 90,923
  • 26
  • 142
  • 180
vstan
  • 221
  • 2
  • 2
  • This is the cleanest solution. Thanks – MarekM May 14 '16 at 12:03
  • 3
    Unfortunately, the `pg_largeobject` catalog is no longer publicly accessible since PostgreSQL 9.0: https://www.postgresql.org/docs/current/static/catalog-pg-largeobject.html. – ochedru Aug 11 '17 at 16:52
9
select pg_column_size(lo_get(lo_oid)) from table;

Gives you the size in bytes.

If you want pretty printing:

select pg_size_pretty(pg_column_size(lo_get(lo_oid))::numeric) from table;
Chris Travers
  • 25,424
  • 6
  • 65
  • 182
  • 2
    somehow it is always 4 bytes more than it actually is, why? – wutzebaer Apr 30 '20 at 07:24
  • 1
    @wutzebaer the first four bytes of a `bytea` column are the size of the rest of it – OrangeDog Mar 17 '21 at 10:34
  • 1
    i.e. you should use `octet_length` (as per other answers) not `pg_column_size` – OrangeDog Mar 18 '21 at 11:35
  • The works, but looks overkill - why should the DB read the complete BLOB (likely multiple hundreds of MB) to throw them away after counting the length - especially if lo_seek (end) just returns its position directly from the internally stored length? – Simon Sobisch Sep 02 '22 at 13:35
6

Try length() or octet_length()

i-g
  • 69
  • 4
  • 22
5

This is my solution:

select
lo.loid,
pg_size_pretty(sum(octet_length(lo.data)))
from pg_largeobject lo
where lo.loid in (select pg_largeobject.loid from pg_largeobject)
group by lo.loid;
cfnerd
  • 3,658
  • 12
  • 32
  • 44
rafahead
  • 51
  • 1
  • 2
3

If the type of the blob column is oid

SELECT length(lo_get(blob_column)) FROM table;

lo_get ( loid oid [, offset bigint, length integer ] ) → bytea

Extracts the large object's contents, or a substring thereof.

https://www.postgresql.org/docs/current/lo-funcs.html

length ( bytea ) → integer

Returns the number of bytes in the binary string.

https://www.postgresql.org/docs/current/functions-binarystring.html

user2418306
  • 2,352
  • 1
  • 22
  • 33