0

I've used the solution (by AdaTheDev) from the thread below as it relates to the same question: How to exclude records with certain values in sql select

But when applying the same query to over 40,000 records, the query takes too long to process (>30mins). Is there another way that's efficient in querying for certain records with certain values (the same question as in the above stackoverflow thread). I've attempted to use this below and still no luck:

 SELECT StoreId
FROM sc
WHERE StoreId NOT IN (
  SELECT StoreId
  FROM StoreClients
  Where ClientId=5
);

Thanks --

Community
  • 1
  • 1
stitch70
  • 143
  • 1
  • 3
  • 15
  • how many records are in the storeclients table? Do you have an index on the storid column? 30 minutes seems weird with only 40K records. – Mark Giaconia Aug 16 '16 at 20:13
  • 42,905 records in the StoreClients table. I currently don't have an index on the StoreID column – stitch70 Aug 16 '16 at 20:17
  • for more meaningful feedback I would post your query or at least relevant parts of your query and potentially some example data and desired result. That link/solution is for a specific case where storeid can have more than 1 clientid and they want to exclude all storeids that have at least 1 clientid = 5. Is this the same case you are trying to solve or do you simply not want records with clientid = 5? Also for performance additional information will be needed. E.g. Looking at your tags this is in Access Database and you are using VBA in Excel to query it? – Matt Aug 16 '16 at 20:35
  • Even with no index this shouldn't take that long. I can write VBA to do the same in a minute or two. Maybe even less. – Kyle Aug 16 '16 at 20:45
  • I am also doing the exact same case where I want to exclude all storeid records that have at least 1 clientid = 5 regardless of their other clientid values (this means that storeid would have duplicate values), which is what I'm also doing. I am using SQL query statements in Microsoft Access. No excel is being used. I'll remove the excel tag. – stitch70 Aug 16 '16 at 20:54
  • I've updated the StoreID column and ClientID column to include Indexed values (with duplicates) but performance is still the same. – stitch70 Aug 16 '16 at 20:59
  • @Kyle, how would the VBA code be written as you proposed? – stitch70 Aug 16 '16 at 21:09
  • House the data in Excel, pull it into an array and construct loops. – Kyle Aug 16 '16 at 21:13
  • Understood, but is there a way to keep this local to Access by using arrays & loops as mentioned and not using Excel? – stitch70 Aug 16 '16 at 21:17
  • I'm not sure, as it's hard to say what is happening. I quit using Access because it was painfully slow over the network at my place of employment. SQL Server will be much faster if that is available (or was for me). I just did a little test in Excel. With 40,000 rows of data and 2 columns (one with random numbers 1 to 10, the other with random numbers 25 to 50) I can populate a list of all values from column 2, where column 1 is not 5, in .0625 seconds. – Kyle Aug 16 '16 at 21:28
  • Gotcha. in concluding, I've Indexed BOTH tables and SAVED the changes and the processing time has SIGNIFICANTLY improved down to less than a second! Indexing made the difference! – stitch70 Aug 16 '16 at 21:31
  • In following up, I'm referencing 1 table for multiple query update results. Is there a way to force this table to always have the Indexed value remaining TRUE for specified fields? The issue is because the table's fields get modified after execution of each query and I currently have to click and change the Indexed fields before executing the next query. – stitch70 Aug 17 '16 at 01:00

2 Answers2

1

You could use EXISTS:

SELECT StoreId
FROM sc
WHERE NOT EXISTS (
  SELECT 1
  FROM StoreClients cl
  WHERE sc.StoreId = cl.StoreId
  AND cl.ClientId = 5
  )

Make sure to create an index on StoreId column. Your query could also benefit from an index on ClientId.

Kamil Gosciminski
  • 16,547
  • 8
  • 49
  • 72
  • Unfortunately, this query causes Access to become Unresponsive and no display of "Running Query" is shown. It appears it causes Access to crash as a final record for the query result is not shown. – stitch70 Aug 16 '16 at 21:08
  • In concluding, I've indexed BOTH tables & the columns of interest and SAVED the changes and the processing time has SIGNIFICANTLY improved down to less than a second! Indexing made the difference as you mentioned, but I'm using the same query that I've used from the thread here: `SELECT DISTINCT sc.StoreId FROM StoreClients sc WHERE NOT EXISTS( SELECT * FROM StoreClients sc2 WHERE sc2.StoreId = sc.StoreId AND sc2.ClientId = 5)` – stitch70 Aug 16 '16 at 21:49
  • The query you've provided (1) is almost identical to the one I've posted (2) only adds the distinct clause in projection (3) has a superfluous `*` in the `NOT EXISTS` part. – Kamil Gosciminski Aug 17 '16 at 08:25
  • Unfortunately this isn't working as my recordset is over 60,000 records. I've attempted using INDEXING and so far it hasn't worked. Is there another solution or perhaps INDEXING isn't being used correctly? I created an index using "CREATE INDEX" – stitch70 Nov 29 '16 at 23:15
  • Through another link, here is a solution that appears to work through initial testing with respect to performance / long hang time: [link] (http://stackoverflow.com/questions/40856980/exclusion-condition-for-sql-ms-access-query) – stitch70 Nov 30 '16 at 22:05
1

40,000 rows isn't too big for SQL server. The other thread is suggesting some good queries too. If I understand correctly, your problem right now is the performance. To improve the performance, you may need to add a new index to your table, or just update the statistics on your table.

If you are running this query in SQL server management studio for testing and performance tuning, make sure that you are selecting "Include Actual Execution Plan". This will make the query to take longer to execute, but the execution plan result can help you on where the time is spent. It usually suggests adding some indexes to improve the performance, which is easy to follow. If adding new indexes are not possible, then you have to spend some time reading on how to understand the execution plan's diagram and finding the slow areas.

Sparrow
  • 2,548
  • 1
  • 24
  • 28
  • his tags don't include sql-server they include ms access which this answer wont hold true for. – Matt Aug 16 '16 at 20:36
  • In concluding, I've indexed BOTH tables & the columns of interest and SAVED the changes and the processing time has SIGNIFICANTLY improved down to less than a second! Indexing made the difference as you mentioned, but I'm using the same query that I've used from the thread here: `SELECT DISTINCT sc.StoreId FROM StoreClients sc WHERE NOT EXISTS( SELECT * FROM StoreClients sc2 WHERE sc2.StoreId = sc.StoreId AND sc2.ClientId = 5)` – stitch70 Aug 16 '16 at 21:46