Which is the best way to form the string value using column from a Table with rows having same ID?

Question

I am using SQL Server 2008.

This are my tables.

Table A

 Aid int pk
 col2
 col3
 XYZID int

Table B

 Bid int pk
 XYZID int
 Col2
 Col3 
 Col4
 seq -- (Sequence )

Table B will be like this

seq   col2    |XYZID|Bid |col3| col4 
===============================================
  1   foo     | 1   |  1 | 12 | wqw
  3   bar     | 1   | 10 | 77 | kikk
  2   foobar  | 1   |  2 |  w | ed
  1   barfoo  | 2   |  4 |  e | dwe
  2   asdsad  | 2   |  5 |  e | e

Table A is the main table and based on the XYZID in A

I need to generate a string value using col from TableB with XYZID and Seq .

I need to generate a string value using col from TableB with XYZID and Seq .

For eg: xyzid= 1

I expected : foo-12-wqw#foobar-w-ed#bar-77-kikk

based on Sequence foo-1,foobar-2,bar-3

Mikael Eriksson · Accepted Answer · 2012-06-21T12:41:27.627

4

For XYZID = 1.

select stuff((select '#'+col2+'-'+col3+'-'+col4
              from TableB
              where XYZID = 1
              order by seq
              for xml path(''), type).value('.', 'nvarchar(max)'), 1, 1, '')

For all rows in TableA:

select stuff((select '#'+col2+'-'+col3+'-'+col4
              from TableB as B
              where A.XYZID = B.XYZID
              order by seq
              for xml path(''), type).value('.', 'nvarchar(max)'), 1, 1, '')
from TableA as A

edited Jun 21 '12 at 12:41

answered Jun 21 '12 at 12:34

Mikael Eriksson

136,425
22
210
281

I tried to profile the execution plan between this answer and CTE approach. This answer's `query cost relative to batch` is 98%. I haven't explore the Sql Server's `for xml path` feature yet, but maybe the bottleneck resides there – Michael Buen Jun 21 '12 at 13:22
2

@MichaelBuen The query cost is not to be trusted when it comes to XML in SQL Server. You should do some comparisons between the solutions with a fair amount of data and use `set statistics time on`. It would be interesting to see what you find. My guess is that the `for xml` version is faster. The reason for the query cost to be so high could be that the cost also includes memory consumption and it has to load some XML stuff get the job done. Another thing about your solutions is that it requires seq to be unique for one `XYZID` and it has to be consecutive. – Mikael Eriksson Jun 21 '12 at 15:08
@MichaelBuen If you do the tests you probably should add an index on XYZID in TableB or even better, have (XYZID, seq) as a clustered unique key. I think both of our solutions should benefit from that. – Mikael Eriksson Jun 21 '12 at 15:09
I tried adding index on both main table and child table, `create index ux_tblA on tblA(xyzid);`, `create index ux_tblx on tblx(xyzid);`. Index has a good benefit on CTE approach, the query cost become lesser, 1%, consequently the XML approach become 99%. Regarding seq is non-unique and consecutive requirement, I'll think about how to tackle it – Michael Buen Jun 21 '12 at 15:26
I update my answer to deal with non-consecutive number, it can work now even the seq field is non-consecutive. And CTE's query cost is still fast, 5% against XML's 95%. Both benchmarked query uses the same tables, with same index – Michael Buen Jun 21 '12 at 15:36
@MichaelBuen Have you timed the queries? – Mikael Eriksson Jun 21 '12 at 15:37
I just checked the execution plan. I haven't tried with big data – Michael Buen Jun 21 '12 at 15:38
1

Hmm.. interesting, the displayed execution plan is not definitive when there's no ample data. I bumped up the rows to 5,000, the XML approach is faster, it took 0 second, the CTE is 2 seconds. I definitely need to explore this XML approach. +1 to your answer – Michael Buen Jun 21 '12 at 15:49
Thanks all for helping me.. :) – kbvishnu Jun 26 '12 at 09:37

Michael Buen · Answer 2 · 2012-06-21T15:34:37.860

DDL:

drop table tblx;
create table tblx
(
seq int,
col2 varchar(50),
xyzid int,
bid int,
col3 varchar(50),
col4 varchar(50)
);

Data:

insert into tblx(seq,col2,xyzid,bid,col3,col4) values
(1,   'foo'     , 1,   1,  '12', 'wqw'),
(3,   'bar'     , 1,   10, '77', 'kikk'),
(2,   'foobar'  , 1,   2,  'w',  'ed'),
(1,   'barfoo'  , 2,   4,  'e',  'dwe'),
(2,   'asdsad'  , 2,   5,  'e',  'e');

Using CTE approach:

with a(xyzid, seq, x) as
(
select xyzid, seq, cast(col2 + '-' + col3 + '-' + col4 as varchar(max)) as x
from tblx
where seq = 1
union all
select t.xyzid, t.seq, a.x + '#' + (t.col2 + '-' + t.col3 + '-' + t.col4)
from tblx t
join a on a.xyzid = t.xyzid and t.seq = a.seq + 1
)
select xyzid, rtrim(x) as x 
from a w
where seq = (select MAX(seq) from a where xyzid = w.xyzid)
order by xyzid;

Output:

xyzid       x
----------- -----------------------------------
1           foo-12-wqw#foobar-w-ed#bar-77-kikk
2           barfoo-e-dwe#asdsad-e-e

(2 row(s) affected)

Using main table (e.g. table A) just need a simple modification on query:

with a(xyzid, seq, x) as
(
select xyzid, seq, cast(col2 + '-' + col3 + '-' + col4 as varchar(max)) as x
from tblx
where seq = 1
union all
select t.xyzid, t.seq, a.x + '#' + (col2 + '-' + col3 + '-' + col4)
from tblx t
join a on a.xyzid = t.xyzid and t.seq = a.seq + 1
)
select w.xyzid, rtrim(x) as x 
from tblA w -- just add this main table
left join a on a.xyzid = w.xyzid 
               and seq = (select MAX(seq) from a where xyzid = w.xyzid)
order by xyzid;

Data:

create table tblA
(
aid int identity(1,1) primary key,
col2 varchar(50),
col3 varchar(50),
xyzid int
);


insert into tblA(col2,col3,xyzid) values
('','',1),
('','',2),
('','',3);

Output:

xyzid       x
----------- ------------------------------------
1           foo-12-wqw#foobar-w-ed#bar-77-kikk
2           barfoo-e-dwe#asdsad-e-e
3           NULL

(3 row(s) affected)

If the seq field is non-consecutive and/or non-unique, put a sequencer:

with sequencer as
(
select 
    xyzid, ROW_NUMBER() over(partition by xyzid order by seq) as seq
    , col2, col3, col4 
from tblx 
)
,a(xyzid, seq, x) as
(
select xyzid, seq, cast(col2 + '-' + col3 + '-' + col4 as varchar(max)) as x
from sequencer
where seq = 1
union all
select t.xyzid, t.seq, a.x + '#' + (col2 + '-' + col3 + '-' + col4)
from sequencer t
join a on a.xyzid = t.xyzid and t.seq = a.seq + 1
)
select w.xyzid, rtrim(x) as x 
from tblA w
left join a on a.xyzid = w.xyzid 
            and seq = (select MAX(seq) from a where xyzid = w.xyzid)
order by xyzid;

Sample non-consecutive seq:

insert into tblx(seq,col2,xyzid,bid,col3,col4) values
(1,   'foo'     , 1,   1,  '12', 'wqw'),
(5,   'bar'     , 1,   10, '77', 'kikk'),
(3,   'foobar'  , 1,   2,  'w',  'ed'),
(1,   'barfoo'  , 2,   4,  'e',  'dwe'),
(3,   'asdsad'  , 2,   5,  'e',  'e');

Output (still the same):

xyzid       x
----------- --------------------------------------
1           foo-12-wqw#foobar-w-ed#bar-77-kikk
2           barfoo-e-dwe#asdsad-e-e
3           NULL

(3 row(s) affected)

Regarding speed, it's still fast. CTE query cost is 5% against XML approach, which is 95%

Which is the best way to form the string value using column from a Table with rows having same ID?

2 Answers2

Linked

Related