I am an in a pickle now and would really appreciate your advice. Got an assignment from our management to take a list of 500000 email addresses and check in our DB if any of them are registered there. The problem is in the amount of users, cause "select * from db..tablename where email in (.....)" will not work with a list that big. The only idea I got so far is to split the list into smaller parts, but it seems very ineffective and time-consuming. Is there any way to make it faster and smarter?
Asked
Active
Viewed 794 times
-2
-
11Load the data into a table and use a `join`, `in`, or `exists`. – Gordon Linoff Mar 19 '19 at 14:04
-
1Can you show us some of what you tried? Or give us an idea of the database structure (i.e. tables, columns, etc.) being used? – hmiedema9 Mar 19 '19 at 14:11
-
Hi Gordon, That was my first idea, but when you do an insert into a table there is a limit of 1000 rows per load, so that was a deal-breaker for me. Anything I can do to avoid the limit? Hi hmiedema9, Basically, to do the job I need to work with one big table (let's call it "clients") and search through it. It has a simple structure: UserID, First name, Last Name, email, Country, Date of registration. I tried to make a list of clients which exist in our DB: Select UserID, email from db..clients where email in (.....) – Tanya Mar 19 '19 at 14:21
-
@Tanya [this blog article](https://chsakell.com/2014/07/13/insert-millions-of-records-in-sql-server-table-at-once/) may have a solution for you to insert it quickly. Basically, you will have to do bulk inserts of 1000 rows in a loop. After inserting everything into the table, go on the way Gordon explained (-> Join, in or exists) – David3103 Mar 19 '19 at 14:26
-
@David3103 thank you for the article, I will check it! – Tanya Mar 19 '19 at 14:44
-
Which [DBMS](https://en.wikipedia.org/wiki/DBMS) product are you using? "SQL" is just a query language, not the name of a specific database product. Performance optimization is highly vendor specific. Please add a [tag](https://stackoverflow.com/help/tagging) for the database product you are using `postgresql`, `oracle`, `sql-server`, `db2`, ... – Mar 19 '19 at 15:51
-
Please **[edit]** your question and add the real query you are using and the `create table` statements for the tables in question as [formatted text](https://meta.stackoverflow.com/a/251362) please, [no screen shots](http://meta.stackoverflow.com/questions/285551/why-may-i-not-upload-images-of-code-on-so-when-asking-a-question/285557#285557) – Mar 19 '19 at 15:52
1 Answers
0
thank you for your ideas! I have just figured a solution that works without any extra manual work.
I've created a new help table with the whole 500k email list, inserted it there using the following statement (it helped me to avoid the 1000 rows limit in an ordinary insert statement):
insert INTO db..emaillist
select 'email1@gmail.com' union all
select 'email2@gmail.com' union all
..........
select 'email3@hotmail.com'
and now I am going to use:
Select userID, email
from bd..clients
where email in (select email from db..emaillist)
To get the list of users who exist in our DB.

Jack Williams
- 141
- 1
- 1
- 15

Tanya
- 33
- 3