Performance of inner join on two tables with large data

Question

JOIN query on below table which contains large amount of data. This executes very slow with java jdbc and jpa.

Use case UI has an option to select list of traders,category and date range.If i select 1 week date range and all traders from dropdown, will the below query joins records from 1 week of stock table with all the records from stock_trader?

            STOCK
------------------------------------------
stock_id long   | PRIMARY KEY AUTO_INCREMENT
stock_code  | varchar(10)
stock_name  |varchar(10)
created_date    |DATETIME
category    |varchar(10)

    STOCK_TRADER
-----------------------------------------
stock_id | long
trader_name | FULLTEXT

SQL query

  Select * from STOCK st join stock_trader trad
     on st.stock_id=trad.stock_id where st.created_date<CURRENT_TIMESTAMP()
     and trad.trader_name in(:traderlist) 
group by st.stock_id

traderlist: list of traders selected from UI

Please comment on the performance gain if I use the data model below which adds an extra column in stock_trader and joining with multiple columns. Can i expect faster response of the below query which maps only 1 week of records from stock table with 1 week of traders from stock_trader.

Is there any way to minimize the processing time of query?And what would be the average execution time for 1 million data.

what if the below join is replaced with a view and applying where clauses on the view from java jdbc/jpa. I found view makes the execution faster but am worried that it will join the whole dataset as where clauses are appllied dynamically from UI. If so performance would be much slower right?

              STOCK_TRADER
-----------------------------------------
stock_id    |   long
trader_name     |   FULLTEXT 
created_date    |   DATETIME

SQL QUERY

Select * from STOCK st join stock_trader trad 
    on st.stock_id=trad.stock_id and 
    st.created_date =trad.created_date where st.created_date<CURRENT_TIMESTAMP()
    and trad.trader_name in(:traderlist) 
group by st.stock_id

Do you really need `created_date < CURRENT_TIMESTAMP()`? Can you create a trade in the future? If you need it, make sure you have an index on `created_date`. Also make sure you have an index on `stock_trader.stock_id`. — Barmar, Apr 29 '17 at 14:30
Adding an extra column to the joining condition cannot possibly improve performance. A properly normalized schema should not duplicate information like that. — Barmar, Apr 29 '17 at 14:32
your cost is probably more on the `group by st.stock_id` than on the join if you have correct index. Please provide us your `EXPLAIN` 'ed query plan. BTW, do you really need to query your whole table ? don't you have any where to restrict a bit the output ? — Blag, Apr 29 '17 at 15:01
i have added the complete scenario. please let me know if u need more details — user5303259, Apr 30 '17 at 11:41
Double check the generated SQL -- you need `IN(1,2,3)`, not `IN('1,2,3')`. — Rick James, Apr 30 '17 at 18:30
It is _wrong_ to `SELECT * ... GROUP BY stock_id`. If there are multiple rows for each `stock_id`, you get random values for the rest of `*`. — Rick James, Apr 30 '17 at 18:32
Yes we "need more details": `SHOW CREATE TABLE` and `EXPLAIN SELECT ...` — Rick James, Apr 30 '17 at 18:34

Performance of inner join on two tables with large data

0 Answers0