1

Suppose I have one table that have 5 columns:

Id int primary key

Date datetime 

Value double

Fund_id reference

FundModel_id reference

FundDataField_id reference

This table have 37 000 000 rows.

Each fund have about 4000 rows.What is the best and quick way to delete rows from this table. I need to delete about 7000000 rows at a time but it takes about 10 minutes and by me this is a lot.

Currently I delete rows by Fund_Id , something like this:

Delete from FundYearDetail where Fund_id In (2054,2056,2058,2059,2061,2063,2064,2065,2066,2067,2069,2072,2073,2076,2078,2079,2080,2081,2082,
2086,2088,2090,2093,2095,2096,2097,2099,2101,2102,2103,2104,2105,2106,2107,2109,2110,2114,2115,2116,2117,2118,2119,2342,2125,2126,2127,2128,2129,2130,2131)

This statement will affect about 200 000 rows and will take very long time to complete , dividing this statement into 2 queries I am getting better performance , about 4 seconds each.

Does anybody know a better solution for this ?

Note:I am using Fluent NHibernate for data access if somebody know a better solution using Nhibernate please tell me. What if I will do a store procedure , will this increase my performance? Thanks.

Nic
  • 1,088
  • 3
  • 19
  • 43
  • 1
    Is this a one-time operation, or a feature you need to add to an application to be repeatable and done by end users of your software? (Either way, I probably wouldn't use NH, assuming you're hydrating all these entities just to delete them) – Veatch May 22 '15 at 16:51
  • Well I use NH to access data in normally cases, I want to delete all this rows just because I am performing large data import from csv files, I am doing this using sqlbulkcopy, so before importing data I want to delete all rows that will be affected rather than update them , as I think that update so many rows will take much more time rather than insert them using sqlbulkcopy . – Nic May 22 '15 at 16:57
  • And yes I am looking for a solution that can be repeatable , but if you have one time solution as well please let me know which one do you have, I think You will not propose to recreate the table ... – Nic May 22 '15 at 16:59
  • 2
    Like with large updates, I would try using a while loop that performs the delete in blocks of 50,000 records or less at a time. – Tab Alleman May 22 '15 at 17:22
  • what @TabAlleman said is the best way to do large deletes in sql - there are a few different questions already answered here that use loops, gotos, top clauses to make that work. I've run into issues time and time again using NH for any bulk operations - your best bet is to push the entire operation into the database - set of stored procedures, SSIS, etc - and then you can do updates much more efficiently and maybe avoid the deletes. Especially if you have foriegn keys, the deletes be really impactful. – Veatch May 22 '15 at 17:25
  • @MDDDC a few things can also show down deletes. A lot of indexes on your table, a wide primary key index, and not having an index on the column your referencing to do the delete. My guess would be you missing an index that would help with your delete. I would use SQL management studio, run the sql script, and include the execution plan. This should show you if your missing any indexes. Also, there is a very good tool on indexes by Brent Ozar's team called sp_BlitzIndex that may help. – Nick O May 22 '15 at 18:27

2 Answers2

2

You could do batch deletes like this:

SELECT 'Starting' --sets @@ROWCOUNT
WHILE @@ROWCOUNT <> 0
    DELETE TOP (50000) dbo.timesheet  --change top value as needed
    WHERE Fund_id IN (2054,2056,2058,2059,2061,2063,2064,2065,2066,2067,2069,2072,2073,2076,
                      2078,2079,2080,2081,2082,2086,2088,2090,2093,2095,2096,2097,2099,2101,
                      2102,2103,2104,2105,2106,2107,2109,2110,2114,2115,2116,2117,2118,2119,
                      2342,2125,2126,2127,2128,2129,2130,2131           
            )

Courtesy of @gbn: Bulk Delete on SQL Server 2008

UPDATE

Alternatively, you could try this approach by inserting the records you want to keep in a temp table and then truncate your actual table. Then, transfer back those temp table records into your actual table. Not sure how much perform gain you would achieve from this, but definitely would recommend doing a BACKUP before doing this:

SELECT col1, col2, col3, col4, col5 INTO #Holdingtable
       FROM FundYearDetail WHERE Fund_id NOT IN (2054,2056,2058,2059,2061,2063,2064,2065,
                    2066,2067,2069,2072,2073,2076,2078,2079,2080,2081,2082,2086,2088,2090,
                    2093,2095,2096,2097,2099,2101,2102,2103,2104,2105,2106,2107,2109,2110,
                    2114,2115,2116,2117,2118,2119,2342,2125,2126,2127,2128,2129,2130,2131           
            )
TRUNCATE TABLE FundYearDetail

INSERT FundYearDetail (
    col1
    ,col2
    ,col3
    ,col4
    ,col5
    )
SELECT 
     col1
    ,col2
    ,col3
    ,col4
    ,col5
FROM #Holdingtable
Community
  • 1
  • 1
FutbolFan
  • 13,235
  • 3
  • 23
  • 35
  • 200000 rows on SQL SERVER EXPRESS was deleted in 43 seconds. not so good. – Nic May 22 '15 at 18:24
  • 1
    That does not mean the solution is not valid, the perf issue could be due to your database log file growing, blocking, rollback, etc. – Greg May 22 '15 at 18:30
  • @Greg ,well , I can do it in 10 seconds ,and I am not looking for valid solution , I am very sorry but valid solution can be and simple delete, I am looking for best solution. – Nic May 22 '15 at 18:35
  • @MDDDC I am sorry to hear that my previous answer didn't help much with the performance. I have updated my answer with an alternative approach (a bit risky) but definitely give it a try. – FutbolFan May 22 '15 at 18:46
  • 1
    @MDDDC, what I am saying is, because you experienced poor performance does not mean it's not the best solution. There are other factors that could have contributed to the poor performance. Log file growth is one of them, you can work around it by sizing your log file properly. Transactional Replication uses batched delete to clean up metadata, it deletes 5K rows at a time, the same way, in sub sec. It can delete millions of rows very quickly. – Greg May 22 '15 at 21:32
  • @Nepali Rookie second solution is much better.Thanks. – Nic May 23 '15 at 14:30
  • Great! Glad to hear that! :) – FutbolFan May 24 '15 at 04:12
1

How about like this:

decalre @tableIds table (Id int)

insert into @tableIds
select 2054 as Id union all
...
//here is other ids
...
select 2131 as Id

while exists(select 1 from FundYearDetail t1 join @table t2 on t1.Fund_id = t2.Id)
begin
delete top(10000) t1
from FundYearDetail t1 join @table t2 on t1.Fund_id = t2.Id
end
Max
  • 1,784
  • 2
  • 19
  • 26