0

I have 3 tables. "Portfolios", "Balances" and "Positions". My users can save multiple portfolios, for each portfolio multiple balances and for each balance multiple positions.

Now I want to query for one portfolio with its balances and for each balances two sums (amount and fees).

My system will automatically filter for the current user.

I have following query:

CREATE OR REPLACE FUNCTION public.getBalancesWithStats(portfolioid bigint)
RETURNS TABLE(
  id bigint,
  name text,
  portfolio_id bigint,
  overall_amount double precision,
  overall_fees double precision) AS $$
    BEGIN
        RETURN QUERY
            SELECT balances.id
    ,balances.name
    ,balances.portfolio_id
    ,SUM(positions.amount) AS overall_amount
    ,SUM(positions.fee) AS overall_fees
FROM balances
LEFT JOIN positions ON positions.crypto_balance_id = balances.id
WHERE balances.portfolio_id = portfolioid
GROUP BY balances.id
    ,balances.name
    ,balances.portfolio_id;
    END;
$$ LANGUAGE plpgsql;

I'm not sure if this would be a performance issue. A user has like 3 portfolios, 10-100 balances and 10 - 100k transactions, this is for sure the largest table.

enter image description here

The result should look like

 id, name, overall_amount, overall_fees

Distinct/Grouped by id

Update

After I created two indexes:

create index balance_portfolio_id on balances(portfolio_id, id)
create index positions_balance_id on positions(crypto_balance_id, id)

and filled one portfolio with 1k balances and a balance with 1k positions and query the other portfolio with one balance and one position the balance switched to indexed scan but positions still seq scan. Before I created an index for positions there was a nested loop as well.

enter image description here

After I put another 10k balances and positions in one portfolio/balance it switched both to bitmap heap scan and Bitmap Index Scan.

enter image description here

I'm not sure what this means but it should be both index scan in the first step and the second bitmap does not say anything to me.

Can I improve my query even more? Thanks a lot guys!

Sesa
  • 11
  • 2
  • 1. If the query is reading 100k rows it's bound to take some time; nowadays it should be in the milliseconds to seconds range. What's your expectation in terms of response time? – The Impaler Jan 29 '22 at 19:14
  • 2. Please include the execution plan. The query may be missing some indexes. – The Impaler Jan 29 '22 at 19:15
  • Milliseconds would be fine haha. Its an app, even one or multiple seconds would be fine. It does not matter how many users I have right or how big the positions table would grow? What is a execution plan? I have indexes on all ids. should I put one on the sum fields? – Sesa Jan 29 '22 at 19:26
  • It always matters if the table grows. I suggest you test it with tons of data. – AminM Jan 29 '22 at 19:44
  • To get the execution plan in PostgreSQL run `EXPLAIN `. Paste the result in the question. – The Impaler Jan 29 '22 at 19:45
  • I posted the picture at my main post. I'm currently working on my app and I don't have a lot of data, just my personal. – Sesa Jan 29 '22 at 19:51
  • @TheImpaler does the query plan helps? I don't see much there but I'm not an SQL guy haha. My last SQL query was like 10 years ago. – Sesa Jan 29 '22 at 21:41
  • @Sesa Well... the execution plan is clear. It's reading the entire tables `position` and `balances`. It's probably missing an index to filter by portfolio_id. Try adding it with: `create index ix1 on balances (portfolio_id, id);`. Then try the query again. – The Impaler Jan 30 '22 at 02:02
  • Thanks for your input. I will create the 2 indexes and test it with more data because I think it will use Seq Scan if I query almost the whole table right? – Sesa Jan 30 '22 at 05:36
  • I added the two indexes and one query is now indexed but looks like not the other. I edited my post for the description. – Sesa Jan 30 '22 at 07:16

1 Answers1

0

Observations from execution plan:

  1. Seq Scan (FULL table ) on BALANCE while filtering rows for portfolio_id (passed as parameter) is selecting 2 rows and not using index - INDEX IS MISSING or NOT USED on balances.portfolio_id. If the index does not exist , create the index and if the index exist verify the health of index (bloated etc...). Rebuild the index ...

  2. The join between POSITION and BALANCE should be nested loop using index - its showing HASH join - INDEX IS MISSING or NOT USED on positions.crypto_balance_id (POSITION is the inner table in the join).

  3. For the 'group by operation' monitor 'temp_blks_read' & 'temp_blks_written' for the queryid in PG_STAT_STATEMENTS while testing and see if this IS high, if so , you will need to increase the work_mem proportionately for the session.

  4. You can enable parallel query for the session . If you do, keep a watch on WORK_MEM because each parallel slave will grab memory equal to WORK_MEM for itself impacting the memory availability for other sessions.

rajorshi
  • 697
  • 4
  • 9
  • Isn't a index automatically created when I create a reference foreign key? I think it will use Seq Scan when I query almost the whole table because its faster. I will create some test data and test it. I don't understand the second point. Do I need to change something in my query or what does a nested loop using index mean? – Sesa Jan 30 '22 at 05:30
  • See this post: https://stackoverflow.com/questions/60211067/how-to-understand-the-nested-loop-in-postgresql-explain – rajorshi Jan 30 '22 at 05:54
  • Defining a FOREIGN KEY on a column does not create an index on the column. Defining a PRIMARY KEY on a column does. Your explain plan says "Seq Scan on balances .... ( rows =2 ) " ... which means after a sequential scan , the operation node is returning only 2 rows . Yes it does full table scan but it does not need to for retrieving 2 rows. Second point is when you have less rows to join (here result of the join node is 3 rows) , A row need to be picked from outer table and a matching rows from inner table needs picked using an INDEX LOOKUP which results in a joined row using a nested loop. – rajorshi Jan 30 '22 at 06:04
  • I created the two indexes create index balance_portfolio_id on balances(portfolio_id, id) create index positions_balance_id on positions(crypto_balance_id, id) and executed the explain, the result was the same. After that I created 1k balances for one portfolio and only one balance for the other portfolio and it switched to Index Scan for balance but positions still seq scan. New picture at post. – Sesa Jan 30 '22 at 06:14
  • I added the description to the post. Its easier to read there – Sesa Jan 30 '22 at 06:20