0

I am trying to get all the photo urls into an array and set that array to be a new column in a different table. The tables have a one to many relationship. I have about 1 million rows on my list_reviews table and around 3 million photos.

Is there a way to do this in batches? When I tried to do it all in one shot I just got empty arrays.

https://www.postgresql.org/message-id/20051219121211.002f7e87.gry@ll.mit.edu and Postgresql select rows(a result) as array

Those work if I am only doing one at a time. I have been thinking about trying to use the STREAMING found here, https://github.com/vitaly-t/pg-promise/wiki/Learn-by-Example#into-database but not sure I totally understand what is going on here.

CREATE TABLE list_reviews (
  id SERIAL PRIMARY KEY,
  product_id INT,
  photos TEXT[]);

CREATE TABLE review_photos (
  id SERIAL,
  review_id INT REFERENCES list_reviews(id) ON DELETE CASCADE,
  url TEXT);

UPDATE list_reviews SET photos = array(
   SELECT url
   FROM review_photos
   WHERE review_photos.id = list_reviews.id 
   AND list_reviews.id = 5);

list_reviews looks like:

+----+------------+--------+--+
| id | product_id | photos |  |
+----+------------+--------+--+
|  5 |          1 | []     |  |
+----+------------+--------+--+

review_photos looks like:

+----+-----------+------------+--+
| id | review_id |   photos   |  |
+----+-----------+------------+--+
|  1 |         5 | something1 |  |
|  2 |         5 | something2 |  |
|  3 |         5 | something3 |  |
+----+-----------+------------+--+

and would expect to see list_reviews:

+----+------------+--------------------------------------+--+
| id | product_id |                photos                |  |
+----+------------+--------------------------------------+--+
|  5 |          1 | [something1, something2, something3] |  |
+----+------------+--------------------------------------+--+
Gordon Linoff
  • 1,242,037
  • 58
  • 646
  • 786
TomC
  • 5
  • 4

1 Answers1

0

Your code basically looks okay. I prefer using array_agg() (because the operation is more explicit), but Postgres allows a result set for array.

One issue is the filtering. I think you intend:

UPDATE list_reviews lr
    SET photos = array(SELECT rp.url
                       FROM review_photos rp
                       WHERE rp.id = lr.id 
                      )
    WHERE lr.id = 5;

Your query will update all rows in list_reviews with the urls from the photos for id = 5.

You can do this in batches by putting ranges on lr.id as you do the processing. For instance:

UPDATE list_reviews lr
    SET photos = array(SELECT rp.url
                       FROM review_photos rp
                       WHERE rp.id = lr.id 
                      )
    WHERE lr.id > 0 and lr.id < 10000;

However, it might be simpler to replace the existing table:

create temporary table temp_list_reviews as
    select id, product_id,  -- all columns but photos
           array(SELECT rp.url
                           FROM review_photos rp
                           WHERE rp.id = lr.id 
                          ) as photos
    from list_reviews;

truncate table list_reviews;

insert into list_reviews (id, product_id, photos)
    select id, product_id, photo
    from temp_list_reviews;

Bulk inserts are usually faster than updates, due to logging considerations.

Gordon Linoff
  • 1,242,037
  • 58
  • 646
  • 786