1

I have a list of Objects that is around 300k. And I'm sending this by batch on a Web API. Since this object is a multi level I decided to send the items in batch of 1000 which converted to JSON string. At the moment, I'm seeing about 50 records inserted per second which my estimation will take around 1+ hour to complete.

In my SQL Server, I have a stored procedure that accepts JSON string and use OPENJSON. The data is inserted into a TempTable which I then reiterate using while as shown (not sure if there's a better way)

While (Select Count(*) From #temporganisations Where process_ind = 0) > 0
Begin
  select top 1
     @orgid=torg.orgid
  from
     #temporganisations torg
  where torg.process_ind=0
  exec AddOrganisation... retvalid out
  exec AddLocation...retvalid
  exec AddContact...retvalid
  etc
  .
  .
  .
  delete from #temporganisations        
    where orgid= @orgid
 
end

As shown, there are multiple stored procedures called within this stored procedure. Each stored procedure has duplicate checks before it performs the INSERT statements. I'm trying to speedup the process. and it's really taking very long to completed the process. So I discovered the Activity Monitor, and noticed the 'Recent Expensive Queries' as shown below shows the INSERT statements that gets called

enter image description here

I'm just wondering if there's something I can do increase the performance on these INSERTS, here's the Execution Plan for one of the INSERT

enter image description here

I read somewhere that I could do the INSERT directly but the stored procedure will be HUGE since I need to add the duplicate checks. I also read about using a variable table instead of temp table but I also read an article that variable tables are meant for small datasets. Thanks in advance.

Below is the XML structure. As you can see I need to link each of the child nodes into the parent node which is the organization.

<Organisations>
<Organisation orgRecordClass="R43">
  <Name>ORGANIZATION X</Name>
  <Date>
    <Type value="Operational" />
    <Start value="1991-04-01" />
    <End value="1994-03-31" />
  </Date>
  <OrgId root="5161f" assigningAuthorityName="H51" extension="R34" />
  <Status value="Inactive" />
  <LastChangeDate value="2013-05-08" />
  <GeoLoc>
    <Location>
      <AddrLn1>458 HOMER ROAD</AddrLn1>
      <Town>LONDON</Town>
      <PostCode>E1 8PL</PostCode>
      <Country>ENGLAND</Country>
      <UPRN>21521</UPRN>
    </Location>
  </GeoLoc>
  <Contacts>
    <Contact type="tel" value=" 233344" />
    <Contact type="fax" value=" 233355" />
  </Contacts>
  <Roles>
    <Role id="R12" uniqueRoleId="1" primaryRole="true">
      <Date>
        <Type value="Operational" />
        <Start value="1991-04-01" />
        <End value="1994-03-31" />
      </Date>

    </Role>
  </Roles>
  <Rels>
    <Rel id="RE6" uniqueRelId="58005">
      <Date>
        <Type value="Operational" />
        <Start value="1991-04-01" />
        <End value="1994-03-31" />
      </Date>
      <Status value="Inactive" />

    </Rel>
  </Rels>

  <Succs>
    <Succ uniqueSuccId="12">
      <Date>
        <Type value="Legal" />
        <Start value="1993-04-01" />
      </Date>
      <Type>Successor</Type>
      <Target>
        <OrgId root="5161f" assigningAuthorityName="H51" extension="R561" />
        <PrimaryRoleId id="R12" uniqueRoleId="37607" />
      </Target>
    </Succ>       
    <Succ uniqueSuccId="12">
      <Date>
        <Type value="Legal" />
        <Start value="1993-04-01" />
      </Date>
      <Type>Successor</Type>
      <Target>
        <OrgId root="5161f" assigningAuthorityName="H51" extension="R561" />
        <PrimaryRoleId id="R12" uniqueRoleId="37607" />
      </Target>
    </Succ>
  </Succs>
</Organisation>
Dale K
  • 25,246
  • 15
  • 42
  • 71
  • 1
    The will undoubtedly be ways to speed your process up, but that all comes down to what's going in in each of the procedures you invoke. As it stands your `while` loop logic should be checking if any rows `exist` rather than asking Sql Server to needlessly count them all on every iteration... – Stu May 08 '21 at 15:41
  • thanks @Stu. There's nothing special on each store procedures. it's just using select to get if an id exist if it doesn't then call INSERT. May I ask a better way of checking if any of rows exist? – Reydan Gatchalian May 08 '21 at 15:52
  • is this a better way While (EXISTS(SELECT 1 FROM #temporganisations Where process_ind = 0)) – Reydan Gatchalian May 08 '21 at 15:57
  • Basically yes - if there's a lot of rows in the table ie many 1000s then an index on process_ind would help. Using `exists` Sql server will stop as soon as it encounters a qualifying row, rather than counting all the rows each time, which will impact if the rowcount is large. – Stu May 08 '21 at 16:08
  • thanks @Stu, I'm new to indexing on temp tables, is this command right? CREATE NONCLUSTERED INDEX ix_temporganisations ON #temporganisations (process_ind); and do I run this after create table or is it after populating the temp table? – Reydan Gatchalian May 08 '21 at 16:36
  • This whole process is just about the slowest and most expensive way to do this. You are doing a bulk load, you need to treat it like that instead of treating it like 300,000 inserts from the UI. Either use SSIS, or explicit SQL bulk/dataset-oriented procedures, not this painful RBAR (row-by-agonizing-row) loops approach. – RBarryYoung May 08 '21 at 16:37
  • @RBarryYoung, The request from the client is a web app that the users will use to upload XML files which then use a webapi to insert to the sql server database. I did tried using sqlbulk but not sure how to link records as shown on the xml structure. – Reydan Gatchalian May 08 '21 at 16:40
  • 1
    You should rewrite this to do it in bulk: inserting from Table-Valued Parameter or a temp table is probably going to be fastest – Charlieface May 08 '21 at 23:23

1 Answers1

-1

First thing is to follow basic best-practices for running a row-by-row process in SQL Server. Here that means use a real cursor instead of that weird cursor-like loop, and wrap the whole loop in a transaction.

begin transaction

declare c cursor local for 
   select * from #temporganisations
open c

fetch next from c into @orgid, @name, ...
while @@fetch_status = 0
begin
  
  exec AddLocation...
  exec AddContact...
  etc
  .
  .
  .
  fetch next from c into @orgid, @name, ...
end

commit transaction

Without a transaction the Transaction Log must be physically flushed after every INSERT.

Second thing is to transform it into a batch-oriented process. You can rewrite each of the stored procedures to accept multiple input rows (TVP, JSON, or XML), or just code each one to read from #temporganisations. Stored procedures can use any temp table declared in the session, and so you can pass bulk data into the stored procedure by inserting into a temp table befor invoking the proc.

EG

create or alter procedure AddLocation
as
begin
   insert into Location (Name, Address)
   select distinct Name, Address
   from #temporganisations
end
David Browne - Microsoft
  • 80,331
  • 6
  • 39
  • 67
  • thanks @davidbrowne, I have added the xml structure I'm working on. As you can see child nodes are linked which means I need to get the organisation_id first before inserting it the other tables. – Reydan Gatchalian May 08 '21 at 16:22
  • If you are going to tell someone to use a cursor over a `while` loop, at least do the basics properly: use `fast_forward`, and either proper cleanup with `try/catch` or use a `@` variable cursor (which automatically deallocates) – Charlieface May 08 '21 at 23:22
  • LOCAL cursors have batch scope and don't require cleanup. And do you suppose that FAST_FORWARD cursor on a temp table is going to be meaningfully faster than a regular DYNAMIC cursor? – David Browne - Microsoft May 09 '21 at 02:55