Node.js: Fetching million rows from MySQL and processing the stream

Question

I have a MySQL table with over 100 million rows. This table is a production table from which a lot of read requests are served. I need to fetch lets say a million rows from this table, process the rows in Node.js script and then store the data to Elasticsearch.

I have done this a lot with MongoDB without facing any issues. I start a read stream and I keep on pausing the stream after reading every 1000 rows, and once the downstream process is complete for this set of 1000 rows, I resume the stream and keep on doing till all the rows are processed. I have not faced any performance challenges with MongoDB since the find query returns a cursor which fetches result in batches. So, irrespective of how big my input is, it does not cause any issues.

Now, I don't know how the streaming queries in MySQL work under the hood. Will my approach work for MySQL or will I have to execute the query again and again like first select query fetches those rows where id < 1000 and the next query fetches result where id between 1000 and 2000 and so on. Anybody worked on similar problem before. I found a similar question on Stackoverflow but there was no answer.

Have you actually tried using MySQL streaming? From the documentation it seems to me that you can set it up similarly to what you did with MongoDB: get a "cursor", collect a number of rows, pause, process rows, resume. — robertklep, Nov 14 '15 at 11:18
I haven't tried it yet. Just want to be sure before trying. One fundamental difference between MongoDB and MySQL streaming with Node.js is that in Mongo, one pauses the query stream while in MySQL, the connection to database is paused. I don't know if that can have any impact or not though. — Mandeep Singh, Nov 14 '15 at 11:21
If you use [connection pooling](https://github.com/felixge/node-mysql#pooling-connections), the connection should be regarded as exclusive (until you release it) and it most likely won't be an issue. — robertklep, Nov 14 '15 at 11:24
The approach turned out to be fine. Was getting almost same performance irrespective of size of the query output — Mandeep Singh, Nov 15 '15 at 22:48
Can you answer the question with the approach you took? @MandeepSingh — Murtaza Haji, Mar 15 '20 at 21:03
Now you can find answer [here](https://stackoverflow.com/a/36015689/713573) or discussion at [node-mysql](https://github.com/mysqljs/mysql/issues/1370) — Gagan, Apr 20 '21 at 04:34

score -1 · Answer 1 · answered Nov 29 '20 at 14:00

You can use LIMIT (Starting Point),(No.Of Records) you need to fetch at a time and change Starting point after every iteration to multiple of no. of records like - LIMIT 0,1000(It will start fetching row from Zeroth Position and will fetch 1000 records) and next time use LIMIT 1000,1000(It will fetch next 1000 records starting from 1000th row)

Node.js: Fetching million rows from MySQL and processing the stream

1 Answers1