Concat each n rows in pentaho

Question

I am trying to form a transformation where I need to concatenate each 10 row value.

1st step: Table input (query from Postgres DB: select id from tablename)

sample result from the above query:

                  id
 00000191-555c-11e4-922d-29fb57a42e4c
 00000192-555c-11e4-922d-29fb57a42e4c
 00000193-555c-11e4-922d-29fb57a42e4c
 00000194-555c-11e4-922d-29fb57a42e4c
 00000195-555c-11e4-922d-29fb57a42e4c
 00000196-555c-11e4-922d-29fb57a42e4c
 00000197-555c-11e4-922d-29fb57a42e4c
 00000198-555c-11e4-922d-29fb57a42e4c
 00000199-555c-11e4-922d-29fb57a42e4c
 0000019a-555c-11e4-922d-29fb57a42e4c
 000001a3-3cf2-11e4-b398-e52ee0ec6a4c
 000002ad-3768-4242-88cf-96f27d0263af
 000003ea-26e3-11e4-ace7-15c7d609fa6e
 00000684-73fb-4d65-a502-87c4eb6607c1
 0000087a-f587-44fa-8e88-7bcae5bcb22c
 00000889-39c5-11e4-bd0e-c3f9d65ac856
 0000094c-be98-4456-8b49-6357a36581aa
 00000987-2f19-4574-ab85-6744a65ee4e3
 00000cd0-4097-11e4-a4e6-af71a3d902c0
 00000e1e-3b55-11e4-9897-d958d55e6784

here I have to concat each 10 rows ids into a single row. Eg. 1-10 row ids in a single row, 11-20 rows ids in another row and so on.

Expected Output:

ids

00000191-555c-11e4-922d-29fb57a42e4c,00000192-555c-11e4-922d-29fb57a42e4c,00000193-555c-11e4-922d-29fb57a42e4c,00000194-555c-11e4-922d-29fb57a42e4c,00000195-555c-11e4-922d-29fb57a42e4c,00000196-555c-11e4-922d-29fb57a42e4c,00000197-555c-11e4-922d-29fb57a42e4c,00000198-555c-11e4-922d-29fb57a42e4c,00000199-555c-11e4-922d-29fb57a42e4c,0000019a-555c-11e4-922d-29fb57a42e4c
000001a3-3cf2-11e4-b398-e52ee0ec6a4c,000002ad-3768-4242-88cf-96f27d0263af,000003ea-26e3-11e4-ace7-15c7d609fa6e,00000684-73fb-4d65-a502-87c4eb6607c1,0000087a-f587-44fa-8e88-7bcae5bcb22c,00000889-39c5-11e4-bd0e-c3f9d65ac856,0000094c-be98-4456-8b49-6357a36581aa,00000987-2f19-4574-ab85-6744a65ee4e3,00000cd0-4097-11e4-a4e6-af71a3d902c0,00000e1e-3b55-11e4-9897-d958d55e6784

I know Group By or Memory Group by will concat rows but in this scenario can I use it if so, how can I use it.

Please help me with this. Thanks in advance!

Edit your question and provide sample data and desired results. — Gordon Linoff, Mar 06 '17 at 12:35
@GordonLinoff added sample data and expected output. Hope now it's understandable. — Arunraj, Mar 06 '17 at 12:42

Cyrus · Answer 1 · 2017-03-06T15:55:21.743

1

If you don't have a suitable field to group your ids by, create one yourself.

In this case, I'd add row numbers in the query and divide them by 10 to get a decent and easily configurable group.

select row_number()/10 + 1 OVER (ORDER BY id) as rnum, id from tablename ORDER BY rnum

This should give you 10 rows with rnum 1, 10 rows with rnum 2, etc. Configure this field as the Group By field and you're done.

edited Mar 06 '17 at 15:55

answered Mar 06 '17 at 13:17

Cyrus

2,135
2
11
14

1

Alternatively, you can create the group field in PDI via combining `Add sequence` step and then dividing the counter by 10 in either `Calculator`, UDJC, java script step or java expression. The PDI way is more clumsy, but could be useful, if you'll need to use other datasources. – Andrei Luksha Mar 06 '17 at 13:37
1

BTW, you will need to ensure that the input for Group By step is sorted on the group fields in order to get correct aggregations. So, you are better to include ORDER BY id or ORDER BY rnum to the query above. I don't think PostgreSQL provides any sorting guarantees for row_number values in general case. – Andrei Luksha Mar 06 '17 at 15:49
@user4637357 I'll add it to the answer, but the ordering happens during the windowing function, so it should be unnecessary. The optimizer would not shuffle the records again without other operations specified. At least, I have never seen that happen in other RDBMSes. – Cyrus Mar 06 '17 at 15:53

score 0 · Accepted Answer · answered Mar 06 '17 at 13:19

smth like this?

t=# \x
Expanded display is on.
t=# with a as 
(
  select ntile(2) over (order by id),id from tablename
) 
select 
  string_agg(id,',') 
from a 
group by ntile;
-[ RECORD 1 ]-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
string_agg | 00000191-555c-11e4-922d-29fb57a42e4c, 00000192-555c-11e4-922d-29fb57a42e4c, 00000193-555c-11e4-922d-29fb57a42e4c, 00000194-555c-11e4-922d-29fb57a42e4c, 00000195-555c-11e4-922d-29fb57a42e4c, 00000196-555c-11e4-922d-29fb57a42e4c, 00000197-555c-11e4-922d-29fb57a42e4c, 00000198-555c-11e4-922d-29fb57a42e4c, 00000199-555c-11e4-922d-29fb57a42e4c, 0000019a-555c-11e4-922d-29fb57a42e4c
-[ RECORD 2 ]-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
string_agg |  000001a3-3cf2-11e4-b398-e52ee0ec6a4c, 000002ad-3768-4242-88cf-96f27d0263af, 000003ea-26e3-11e4-ace7-15c7d609fa6e, 00000684-73fb-4d65-a502-87c4eb6607c1, 0000087a-f587-44fa-8e88-7bcae5bcb22c, 00000889-39c5-11e4-bd0e-c3f9d65ac856, 0000094c-be98-4456-8b49-6357a36581aa, 00000987-2f19-4574-ab85-6744a65ee4e3, 00000cd0-4097-11e4-a4e6-af71a3d902c0, 00000e1e-3b55-11e4-9897-d958d55e6784

Wouldn't ntile(2) aggregate different number of rows depending on dataset size? — Andrei Luksha, Mar 06 '17 at 13:28
@VaoTsun how about the performance of the above query when the table has millions of records? — Arunraj, Mar 06 '17 at 15:02
man, if you worry about millions of rows, you should clearly state it in question - not speaking about dividing 20 rows into two rows. And of course build in window functions will be as performant as possible. — Vao Tsun, Mar 06 '17 at 15:05
sorry that I didn't mention about million records, I will try this. thanks! — Arunraj, Mar 07 '17 at 05:49

score 0 · Answer 3 · answered Mar 07 '17 at 02:22

0

I think the solution is:

select string_agg(id, ',')
from (select t.*, row_number() over (order by id) - 1 as seqnum
      from t
     ) t
group by floor(seqnum / 10);

Although this uses string_agg(), I would probably use arrays for the result.

answered Mar 07 '17 at 02:22

Gordon Linoff

1,242,037
58
646
786

Concat each n rows in pentaho

3 Answers3