0

I'm developing a web application for my SaaS product (and now for the first time running trials with a customer in production). I'd like to ask for suggestions on where to look for possible causes.

The general situation is this (I can provide more information if needed):

Express js app with a couple of static pages, showing forms to capture data from barcode scanners connected to the client PCs. On submit, the forms post to api endpoints exposed by the same express app. I'm using pg-promise to connect to the DB, and the queries inside the post controller are like this

  db.task("my_task", t => {
    return t.none("SET ROLE express_app;").then(() => {
      let query = `
        SELECT some_function($1);
      `;
      return t.none(query, [req.params.barcode]);
    });
  }).then ...

In Postgres, some_function(text) is defined in plgpsql. It first selects some data and performs several verifications using constructs like the following:

BEGIN
    SELECT * INTO STRICT l_var
    FROM some_table WHERE barcode = $1
EXCEPTION
    WHEN NO_DATA_FOUND THEN
        RAISE EXCEPTION 'Some customized error msg';
END;

IF l_var.something_bad THEN
        RAISE EXCEPTION 'Some other error msg';
END IF;

INSERT INTO some_log_table (barcode) VALUES ($1);

PERFORM another_function();

The function another_function() is also defined in plpgsql with very similar structure (verifications first, updates later), and at some point it increments a value in a different table using

UPDATE inc_table SET inc_field = inc_field + 1;

The problem I'm experimenting is that after a day of use (12hs), with arround 2000 calls to these functions from 4 client PCs is that I get some inconsistencies between the number of records in the log_table and the value of the field.

There are also other functions that remove from the log table and decrement the value. They also get invoked using the same pattern (the function that takes the barcode removes it from the log table and the calls another function that does more stuff and the decrements the field).

The first couple of days I got inconsistencies of 6 or 7 errors. The first thing I did was to change all these functions definitions from PROCEDUREs to FUNCTIONs, as I read the behavior regarding transactions is different (not sure I understand the implications). The errors kept occurring. Then I found that the default transaction isolation level allows some inconsistencies, so I changed it to serializable in the configuration. I think it improved with that change, as I run to days without errors. But... yesterday I was very unhappy to find there was one error.

I've seen in the express logs that the clients some times send duplicate messages with a couple of milliseconds time difference (the client PCs are connected to a sometimes-unstable wifi), and the db appears to handle them correctly, at least most of the time.

At this point I'm not sure where to look for the problem. I'm not sure if there's something wrong with my setup, or it could be a bug in my logic (which is longer than what I've shown here). I read that exception blocks interact with the current transaction in some way, but I would expect that no matter what concurrency exists it will all run ok or all fail without changes.

I'm running this on a Linode, with Ubuntu 20.04, Postgres 13.4, Node 16.9.1, Express 4.16.1 and pg-promise 10.6.1

Any help on this would be greatly appreciated.

Thanks!

Awer Muller
  • 557
  • 7
  • 17
  • My question would be why have the `inc_table` in the first place? And if you do need it why not have a trigger on `some_log_table` that does the incrementing/decrementing? – Adrian Klaver Sep 18 '21 at 17:57
  • Hi Adrian, thanks for the perspective. I'm thinking of changing some implementation details, maybe getting rid of the exception blocks. But as this is (hopefully) going to be a long journey, I'd like to understand why my current implementation is failing before changing it. The logic is more involved than I explained here. I'm keeping track of inventory and serial_numbers separately, as the serial numbers are not used by other customers. The function invoked from the api controller moves serial_numbers, and that function calls the core stock-moving function. – Awer Muller Sep 18 '21 at 18:31
  • Without seeing the whole process it is not possible to come to answer. My trigger suggestion is based on the premise of keeping the logic close to the activity. In this case, if I am following correctly, the creation/deletion of records in `some_log_table`. Trying to keep independently moving code synced is difficult. – Adrian Klaver Sep 18 '21 at 20:05

0 Answers0