0

I am trying to form a transformation where I need to concatenate each 10 row value.

1st step: Table input (query from Postgres DB: select id from tablename)

sample result from the above query:

                  id
 00000191-555c-11e4-922d-29fb57a42e4c
 00000192-555c-11e4-922d-29fb57a42e4c
 00000193-555c-11e4-922d-29fb57a42e4c
 00000194-555c-11e4-922d-29fb57a42e4c
 00000195-555c-11e4-922d-29fb57a42e4c
 00000196-555c-11e4-922d-29fb57a42e4c
 00000197-555c-11e4-922d-29fb57a42e4c
 00000198-555c-11e4-922d-29fb57a42e4c
 00000199-555c-11e4-922d-29fb57a42e4c
 0000019a-555c-11e4-922d-29fb57a42e4c
 000001a3-3cf2-11e4-b398-e52ee0ec6a4c
 000002ad-3768-4242-88cf-96f27d0263af
 000003ea-26e3-11e4-ace7-15c7d609fa6e
 00000684-73fb-4d65-a502-87c4eb6607c1
 0000087a-f587-44fa-8e88-7bcae5bcb22c
 00000889-39c5-11e4-bd0e-c3f9d65ac856
 0000094c-be98-4456-8b49-6357a36581aa
 00000987-2f19-4574-ab85-6744a65ee4e3
 00000cd0-4097-11e4-a4e6-af71a3d902c0
 00000e1e-3b55-11e4-9897-d958d55e6784

here I have to concat each 10 rows ids into a single row. Eg. 1-10 row ids in a single row, 11-20 rows ids in another row and so on.

Expected Output:

ids

00000191-555c-11e4-922d-29fb57a42e4c,00000192-555c-11e4-922d-29fb57a42e4c,00000193-555c-11e4-922d-29fb57a42e4c,00000194-555c-11e4-922d-29fb57a42e4c,00000195-555c-11e4-922d-29fb57a42e4c,00000196-555c-11e4-922d-29fb57a42e4c,00000197-555c-11e4-922d-29fb57a42e4c,00000198-555c-11e4-922d-29fb57a42e4c,00000199-555c-11e4-922d-29fb57a42e4c,0000019a-555c-11e4-922d-29fb57a42e4c
000001a3-3cf2-11e4-b398-e52ee0ec6a4c,000002ad-3768-4242-88cf-96f27d0263af,000003ea-26e3-11e4-ace7-15c7d609fa6e,00000684-73fb-4d65-a502-87c4eb6607c1,0000087a-f587-44fa-8e88-7bcae5bcb22c,00000889-39c5-11e4-bd0e-c3f9d65ac856,0000094c-be98-4456-8b49-6357a36581aa,00000987-2f19-4574-ab85-6744a65ee4e3,00000cd0-4097-11e4-a4e6-af71a3d902c0,00000e1e-3b55-11e4-9897-d958d55e6784

I know Group By or Memory Group by will concat rows but in this scenario can I use it if so, how can I use it.

Please help me with this. Thanks in advance!

Arunraj
  • 558
  • 5
  • 21

3 Answers3

1

If you don't have a suitable field to group your ids by, create one yourself.

In this case, I'd add row numbers in the query and divide them by 10 to get a decent and easily configurable group.

select row_number()/10 + 1 OVER (ORDER BY id) as rnum, id from tablename ORDER BY rnum

This should give you 10 rows with rnum 1, 10 rows with rnum 2, etc. Configure this field as the Group By field and you're done.

Cyrus
  • 2,135
  • 2
  • 11
  • 14
  • 1
    Alternatively, you can create the group field in PDI via combining `Add sequence` step and then dividing the counter by 10 in either `Calculator`, UDJC, java script step or java expression. The PDI way is more clumsy, but could be useful, if you'll need to use other datasources. – Andrei Luksha Mar 06 '17 at 13:37
  • 1
    BTW, you will need to ensure that the input for Group By step is sorted on the group fields in order to get correct aggregations. So, you are better to include ORDER BY id or ORDER BY rnum to the query above. I don't think PostgreSQL provides any sorting guarantees for row_number values in general case. – Andrei Luksha Mar 06 '17 at 15:49
  • @user4637357 I'll add it to the answer, but the ordering happens during the windowing function, so it should be unnecessary. The optimizer would not shuffle the records again without other operations specified. At least, I have never seen that happen in other RDBMSes. – Cyrus Mar 06 '17 at 15:53
0

smth like this?

t=# \x
Expanded display is on.
t=# with a as 
(
  select ntile(2) over (order by id),id from tablename
) 
select 
  string_agg(id,',') 
from a 
group by ntile;
-[ RECORD 1 ]-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
string_agg | 00000191-555c-11e4-922d-29fb57a42e4c, 00000192-555c-11e4-922d-29fb57a42e4c, 00000193-555c-11e4-922d-29fb57a42e4c, 00000194-555c-11e4-922d-29fb57a42e4c, 00000195-555c-11e4-922d-29fb57a42e4c, 00000196-555c-11e4-922d-29fb57a42e4c, 00000197-555c-11e4-922d-29fb57a42e4c, 00000198-555c-11e4-922d-29fb57a42e4c, 00000199-555c-11e4-922d-29fb57a42e4c, 0000019a-555c-11e4-922d-29fb57a42e4c
-[ RECORD 2 ]-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
string_agg |  000001a3-3cf2-11e4-b398-e52ee0ec6a4c, 000002ad-3768-4242-88cf-96f27d0263af, 000003ea-26e3-11e4-ace7-15c7d609fa6e, 00000684-73fb-4d65-a502-87c4eb6607c1, 0000087a-f587-44fa-8e88-7bcae5bcb22c, 00000889-39c5-11e4-bd0e-c3f9d65ac856, 0000094c-be98-4456-8b49-6357a36581aa, 00000987-2f19-4574-ab85-6744a65ee4e3, 00000cd0-4097-11e4-a4e6-af71a3d902c0, 00000e1e-3b55-11e4-9897-d958d55e6784
Vao Tsun
  • 47,234
  • 13
  • 100
  • 132
0

I think the solution is:

select string_agg(id, ',')
from (select t.*, row_number() over (order by id) - 1 as seqnum
      from t
     ) t
group by floor(seqnum / 10);

Although this uses string_agg(), I would probably use arrays for the result.

Gordon Linoff
  • 1,242,037
  • 58
  • 646
  • 786