0

I am running a node server with the postgres-node (pg) package.

I wrote a program, which requests n-queries (for instance 20,000) at once to my postgres database.

When I do this with several clients who want to query 20,000 at once too, there is no parallelity. That means, the requests of the second client will be queued until the first client finished all his queries.

Is this a normal behavior for postgres? If yes, how can I prevent that one user gets all the ressources (and the others have to wait) if there is no parallelity?

This is my code:

const express = require('express');
const app = express();
const { Pool } = require("pg");
const pool = new Pool();


benchmark(){
    pool.connect((err, client, done) => {
      if (err) throw err;
      client.query("SELECT * from member where m_id = $1", [1], (err, res) => {
        done();
        if (err) {
          console.log(err.stack);
        } else {
          console.log(res.rows[0]);
        }
      });
    }); 
}


app.get('/', function(req, res) {
  for(let i=0;i<20000;i++){
    benchmark();
  }
});
André
  • 1,078
  • 10
  • 22
  • Sounds as if Node.js runs every query on the same connection. Did you create a connection pool? –  Aug 19 '19 at 11:28
  • I edited my code a bit, but yes as you can see I created a connection pool. – André Aug 19 '19 at 11:34
  • _" there is no parallelity.."_ node is asynchronous, you can either work with promises or spawn more clients/pools and tune your max-connections (as explained in my answer), but with multiple clients running around 20.000 queries, they won't resolve with a result instantly or parallel. What is the exact goal you try to achieve? – iLuvLogix Aug 19 '19 at 11:51

1 Answers1

0

First you need to create a connection pool, here's an example with node's pg in a separate module (node-pg-sql.js) for convenience:

node-pg-sql.js:

const { Pool } = require('pg');

const pool = new Pool(fileNameConfigPGSQL);

module.exports = {
  query: (text, params, callback) => {
    const start = Date.now()
    return pool.query(text, params, (err, res) => {
      const duration = Date.now() - start
  //    console.log('executed query', { text, duration, rows: res.rowCount })
      callback(err, res)
    })
  },
  getClient: (callback) => {
    pool.connect((err, client, done) => {
      const query = client.query.bind(client)

      // monkey patch 
      client.query = () => {
        client.lastQuery = arguments
        client.query.apply(client, arguments)
      }

      // Timeout 5 sek
      const timeout = setTimeout(() => {
     //   console.error('A client has been checked out for more than 5 seconds!')
     //   console.error(`The last executed query on this client was: ${client.lastQuery}`)
      }, 5000)

      const release = (err) => {
        // 'done' Methode - returns client to the pool
        done(err)

        // clear Timeouts
        clearTimeout(timeout)

        // reset der Query-Method before Monkey Patch
        client.query = query
      }

      callback(err, client, done)
    })
  }
}

In your postgresql.conf (on linux normally under /var/lib/pgsql/data/postgresql.conf) set max-connection to the desired value:

max_connection = 300

Keep in mind:

Each PostgreSQL connection consumes RAM for managing the connection or the client using it. The more connections you have, the more RAM you will be using that could instead be used to run the database.

While increasing your max-connections, you need to increase shared_buffers and kernel.shmmax as well in order for the client-connection increase to be effective .

Whenever you want to run a query from in one of your routes/endpoints just require the separate client-pool-file like:

const db = require('../../../node-pg-sql');

module.exports = (router) => {

router.get('/someRoute', (req, res) => {
    console.log(`*****************************************`);
    console.log(`Testing pg..`);
    let sqlSelect = `SELECT EXISTS (
          SELECT 1 
          FROM   pg_tables
          WHERE  schemaname = 'someschema'
        )`;
    db.query(sqlSelect, (errSelect, responseSelect) => {
      if (errSelect) {
        /* INFO: Error while querying table */
        console.log(`*****************************************`);
        console.log(`ERROR WHILE CHECKING CONNECTION: ${errSelect}`);
      } 
      else {
        // INFO: No error from database
        console.log(`*****************************************`);
        console.log(`CONNECTION TO PGSQL WAS SUCCESSFUL..`);
        res.json({ success: true, message: responseSelect, data:responseSelect.rows[0].exists });

      }
    })
  });

}

EDIT:

"there is no parallelity.."

Node is asynchronous, you can either work with promises or spawn more clients/pools and tune your max-connections (as explained in my answer, but keep performance of your host-machine in mind), but with multiple clients running around 20.000 queries, they won't resolve with a result instantly or parallel. What is the exact goal you try to achieve?

"Is this a normal behavior for postgres?"

This is due to node's event-loop as well as due to certain performance-limitation of the host-machine running the Postgres.

iLuvLogix
  • 5,920
  • 3
  • 26
  • 43
  • if you look at my code that is pretty much what I did (just without all this overhead) – André Aug 19 '19 at 11:50
  • `max-connection = 1000` on the Postgres side is way too high. Even with really high-end hardware, even 500 is probably more then a server could handle –  Aug 19 '19 at 12:08
  • @a_horse_with_no_name Thanks for the info - I'll edit my answer.. Could maybe weight-sharing via distribution to multiple, high-performance postgre-servers be a solution for the OP's issue? – iLuvLogix Aug 19 '19 at 13:03
  • The configuration should be done based on the number of concurrent clients, not on the total number of queries sent to the database (by all clients). –  Aug 19 '19 at 13:15
  • @a_horse_with_no_name That's clear to me, my question was the following: When the number of the concurrent clients and their load is too high, the only solution would be to use multiple host-machines/sql-servers and distribute the client-connections, correct? – iLuvLogix Aug 19 '19 at 13:26