I am on SQL Server 2012/SQL Server 2016.
The Xquery function 'modify' does only change one row at once. So I loop through my XML-data, element-by-element. This is quite slow. When there is only one distinct value to change I would cast the XML and replace the value. But there are multiple distinct values in most of the cases (rows).
I hope the structure of my_xml becomes clear from the (stronly simplified) data: There are multiple labels within a group of element (a and b).
--My data
if object_id ('tempdb..#t1') is not null drop table #t1
create table #t1 (id int not null identity (1,1) unique clustered, my_xml xml)
insert #t1 (my_xml) values
('<root><a label=''name A'' /> <b /> </root>'), --A
('<root><a label=''name A'' /> <b label=''name B'' /> </root>'), --AB
('<root><a label=''name A'' /> <b label=''name B'' /> <a label=''name A'' /> <b label=''name B'' /><a label=''name A'' /> <b label=''name B'' /></root>'), --ABABAB
('<root><a label=''name B'' /> <b label=''name A'' /> <a label=''name B'' /> <a label=''name B'' /> <a label=''name B'' /> </root>') --BABBB
select * from #t1
I would like to change the value of the label to the SHA-Hash of the original value.
I query some information about the structure of my_xml to minimize the rows to modify:
--My helper-table
if object_id ('tempdb..#t2') is not null drop table #t2
select
id,
max(my_xml.value('count(/root/*[self::a or self::b]/@label)', 'int')) max_label,
max(my_xml.value('count(distinct-values(/root/*[self::a or self::b]/@label))', 'int')) max_label_distinct
into #t2
from #t1
group by id
select * from #t2
Then I modify XML (take only those rows with more than one distinct label as only with them I need to loop through the XML)
--My query:
set nocount on
declare
@label char(64),
@id int,
@n_max_label smallint
declare test cursor for
select
id,
max_label
from #t2
where max_label_distinct > 1
open test
fetch next from test into @id, @n_max_label
while (@@FETCH_STATUS <> -1)
begin
while @n_max_label > 0
begin
select @label = isnull(convert(char(64), HASHBYTES('SHA2_256', my_xml.value('((/root/*[self::a or self::b]/@label)[sql:variable("@n_max_label")])[1]', 'char(64)')), 1), 'x')
from #t1
where id = @id
update #t1
set my_xml.modify('replace value of ((/root/*[self::a or self::b]/@label)[sql:variable("@n_max_label")])[1] with sql:variable("@label")')
where id = @id
set @n_max_label = @n_max_label - 1
end
fetch next from test into @id, @n_max_label
end
close test
deallocate test
final result:
select * from #t1
I would like to improve the solution by the factor of 10 to 100 (in terms of speed) as by now I would need about 2 month to calculate the result - what is to long to manipulate about 600 GB of data. (unloading and using other tools is no option)