-1

I am using node js 10.16.0 and the node-mssql module to connect to a DB. Everything works fine and my simple queries work fine.

If I try to stream data from a query, using the node-mssql example , the first time I execute its very slow. It doesnt show a Timeout Error, but takes about a minute or more to complete.

According to the console log, it brings the first 55 rows and then stops for a while. It looks like it takes some time between the "sets" of data, as I divide them, according to my code below . If I execute the same query a second or third time, it takes only a second to complete. The total amount of rows is about 25.000 or more

How can I make my stream queries faster, at least the first time

Here is my code

following the example, the idea is, start streaming, set 1000 rows, pause streaming, process that rows, send them back with websockets, empty all arrays, continue with streaming, until done

let skate= [];
let leather= [];
let waterproof = [];  
let stream_start = new Date();

const request = new sql.Request(pool);
request.stream = true;     
request
.input('id_param', sql.Int, parseInt(id))
.input('start_date_param', sql.VarChar(50), startDate)
.input('stop_date_param', sql.VarChar(50), stopDate)  
.query('SELECT skate, leather , waterproof FROM shoes WHERE id = @id_param AND CAST(startTime AS date) BETWEEN @start_date_param AND @stop_date_param ');

request.on('row', row => {     
  rowc++; console.log(rowc);
  rowsToProcess.push(row); 
  if (rowsToProcess.length >= 1000) {  
    request.pause();
    processRows();
  } 
});

const processRows = () => {
  rowsToProcess.forEach((item, index) => { 
    skate.push(item.skate);  
    leather.push(item.leather );  
    waterproof.push(item.waterproof);  
  });              
  measurementsData.push(
    {title: 'Skate shoes', data: skate}, 
    {title: 'Leather shoes', data: leather}, 
    {title: 'Waterproof shoes', data: waterproof}
  );  
  console.log('another processRows done');  
  //ws.send(JSON.stringify({ message: measurementsData }));
  rowsToProcess = [];
  skate= [];
  leather= [];
  waterproof = [];       
  measurementsData = [];
  request.resume();
}

request.on('done', () => {      
  console.log('rowc , ', rowc);
  console.log('stream start , ', stream_start);
  console.log('stream done , ', new Date());
  processRows(); 
});
Christos Lytras
  • 36,310
  • 4
  • 80
  • 113
codebot
  • 517
  • 8
  • 29
  • 55
  • 1
    If it's only the first time, this sounds like later queries are using cached information (plans, tables in memory, etc). Sounds like the table may be poorly indexed, or `shoes` is a `VIEW`, with some "complex" logic behind it. – Thom A Nov 25 '19 at 12:48
  • @Larnu My thoughts exactly. So, this is a DB issue ? No, its not a VIEW. I should first check the table indexing? – codebot Nov 25 '19 at 12:52
  • 1
    Depends what you mean by "DB issue". Is it the database engines fault if the table is poorly indexed? No; it's the DBA/Developers. I would start by looking at the execution plans or pushing back the aforementioned people. – Thom A Nov 25 '19 at 12:58
  • first thing is to add a non-clustered index to shoes for the id field. but just the names point to not using a dataset with 100k+ rows so probably not that. – smoore4 Dec 11 '19 at 16:06
  • @smoore4 Sorry, I did not get what you said about the names. Can you rephrase ? Thanks – codebot Dec 11 '19 at 16:29
  • sorry for assuming here. just looked like a test hockey database to me. nothing about brands, quantity, price, etc., so looked like dev/test, which also usually means not a quantity of data where indexes play a huge role. could be totally worng about that tho. – smoore4 Dec 11 '19 at 16:41
  • @smoore4 Btw I did a count on shoes id and I got back 12.140.950. It just the columns for that query that are few. Also the table is indexed on id, which is also a primary key. Thanks. – codebot Dec 12 '19 at 08:58

3 Answers3

1

I would try to improve the indexing of shoes table. From what I see, 2 possible issues with your query/indexing :

  • You filter by datetime startTime column but there is index only on the id column (according to the comments)
  • You cast datetime to date within the where clause of the query

Indexes

As you're filtering only on date without time part, I'd suggest you to create a new column startDate which is the conversion of startTime to date and create an index on it. And then use this indexed column in the query.

Also, since you select only skate, leather , waterproof columns, including them in the index could give better performances. Read about indexes with included columns.

If you are always selecting data that is greater or older than certain date then you may look into filtered indexes.

Avoid cast in where

Even if in general cast does not cost but when using it within where clause it might keep SQL Server from making efficient use of the indexes. So you should avoid it.

If you create a new column with just the date part and index it as cited above, you don't need to use cast here:

WHERE id = @id_param AND startDate BETWEEN @start_date_param AND @stop_date_param
Community
  • 1
  • 1
blackbishop
  • 30,945
  • 11
  • 55
  • 76
  • Please note that when I leave indexes and casts as is and I just do `SELECT top 10000 skate, leather , waterproof FROM shoes WHERE id = @id_param AND CAST(startTime AS date) BETWEEN @start_date_param AND @stop_date_param` and then `f (rowsToProcess.length >= 50) { ` instead of 1000 it runs much faster with zero to no lag. What could this mean? Thanks – codebot Dec 16 '19 at 16:50
  • 1
    I don't think you can get any conclusion with this. You should really check in the database side how much the query takes **without any cache**. Actually I don't think that changing the batch size in `node-mssql` to 50 can make it run faster. – blackbishop Dec 17 '19 at 17:49
0

When a query runs slow the first time but fast in subsequent executions, as someone suggested earlier, its generally due to caching. The performance is quite likely related to the storage device that the database is operating on.

I expect the explain plan does not change between executions.

alexherm
  • 1,362
  • 2
  • 18
  • 31
  • Please not that the table is indexed on id, which is also a primary key. I dont know if this makes any difference. Also, I have access to the DB via Heidi. What kind of tests or settings can I do , to check the explain plan? Thanks – codebot Dec 12 '19 at 09:00
  • Please note that when I leave indexes and casts as is and I just do `SELECT top 10000 skate, leather , waterproof FROM shoes WHERE id = @id_param AND CAST(startTime AS date) BETWEEN @start_date_param AND @stop_date_param` and then `f (rowsToProcess.length >= 50) { ` instead of 1000 it runs much faster with zero to no lag. What could this mean? Thanks – codebot Dec 16 '19 at 16:51
0

you should remove the cast on where clause or create a computed index (if possible in your db)

operations in the column always may hurt your query, avoid it if possible

try just set your where parameters

@start_date_param to date yyyy-mm-dd 00:00:00

@stop_date_param to date yyyy-mm-dd 23:59:59

AND startTime BETWEEN @start_date_param AND @stop_date_param
Frederic
  • 1,018
  • 6
  • 11
  • Please note that when I leave indexes and casts as is and I just do `SELECT top 10000 skate, leather , waterproof FROM shoes WHERE id = @id_param AND CAST(startTime AS date) BETWEEN @start_date_param AND @stop_date_param` and then `f (rowsToProcess.length >= 50) { ` instead of 1000 it runs much faster with zero to no lag. What could this mean? Thanks – codebot Dec 16 '19 at 16:51
  • 1
    @codebot, try to add DBCC FREEPROCCACHE to clean your cache, to performance analisys. or add the OPTION (RECOMPILE); ```SELECT top 10000 skate, leather , waterproof FROM shoes WHERE id = @id_param AND CAST(startTime AS date) BETWEEN @start_date_param AND @stop_date_param OPTION (RECOMPILE)``` I hope it will be always slow in any test you do until fix the cast and / or indexes – Frederic Dec 17 '19 at 16:54