1

Motto of the query is very simple, to find out the last entry on a foreign key column. the pseudo code I can say is

select vehicleid , last_journey_point , last_journey_time from journeyTable.

here is my SQL statement

-- loconumber is a indexed column -- journeyserla is a autonumber primary key int(11) -- the table locojourney contains 400,000 records

-- the below block of code executes in 19 secs

with LocomotiveLastRun AS(
     -- this block of code runs in 0.016 sec
    SELECT locojourney.loconumber , MAX(locojourney.journeyserla) as lastrunid
      FROM locojourney GROUP BY loconumber)

  SELECT locojourney.CurrentCombiners , locojourney.JourneySerla , 
         locojourney.From_RunPoint , locojourney.NEXT_RunPoint 
   FROM  LocomotiveLastRun FORCE INDEX(lastrunid)
   JOIN  locojourney FORCE INDEX(PRIMARY) ON x.lastrunid = locojourney.journeyserla 
  WHERE  locojourney.ishoc = 'n'

the EXPLAIN command shows a derived table which is using no index and using where and type ALL

enter image description here

This is the table definition:

    -- SHOW CREATE TABLE locojourney
CREATE TABLE `locojourney` (
  `trainID` smallint(5) NOT NULL,
  `LocoNumber` varchar(5) CHARACTER SET latin1 COLLATE latin1_swedish_ci NOT NULL,
  `CurrentLocoBase` varchar(10) CHARACTER SET latin1 COLLATE latin1_swedish_ci DEFAULT NULL,
  `CurrentDuedate` date DEFAULT NULL,
  `LocoConsist` varchar(10) CHARACTER SET latin1 COLLATE latin1_swedish_ci NOT NULL,
  `CurrentLocoDomain` varchar(10) CHARACTER SET latin1 COLLATE latin1_swedish_ci DEFAULT NULL,
  `DomainChange` varchar(10) CHARACTER SET latin1 COLLATE latin1_swedish_ci NOT NULL,
  `FEDR` enum('N','Y') CHARACTER SET latin1 COLLATE latin1_swedish_ci DEFAULT 'N',
  `LADR` enum('N','Y') CHARACTER SET latin1 COLLATE latin1_swedish_ci DEFAULT 'N',
  `ISBANKER` enum('N','Y') CHARACTER SET latin1 COLLATE latin1_swedish_ci DEFAULT 'N',
  `TrainName` varchar(10) CHARACTER SET latin1 COLLATE latin1_swedish_ci NOT NULL,
  `WithOutLoad` enum('N','Y') CHARACTER SET latin1 COLLATE latin1_swedish_ci NOT NULL DEFAULT 'N',
  `runRoute` varchar(50) CHARACTER SET latin1 COLLATE latin1_swedish_ci NOT NULL,
  `From_RunPoint` varchar(10) CHARACTER SET latin1 COLLATE latin1_swedish_ci NOT NULL,
  `From_RunTime` datetime NOT NULL,
  `NEXT_RunPoint` varchar(10) CHARACTER SET latin1 COLLATE latin1_swedish_ci NOT NULL,
  `NEXT_RunTime` datetime NOT NULL,
  `Affects_Outage` enum('N','Y') CHARACTER SET latin1 COLLATE latin1_swedish_ci DEFAULT 'N',
  `Affects_Mileage` enum('N','Y') CHARACTER SET latin1 COLLATE latin1_swedish_ci DEFAULT 'N',
  `GroundDistance` double(5,2) DEFAULT '0.00',
  `SHGallowance` int(11) DEFAULT '0',
  `Outage` double(5,4) DEFAULT '0.0000',
  `UnderServiceType` enum('FHT','CHG','DEP','MIX','DETN') CHARACTER SET latin1 COLLATE latin1_swedish_ci NOT NULL DEFAULT 'FHT',
  `SubServiceHead` varchar(25) CHARACTER SET latin1 COLLATE latin1_swedish_ci NOT NULL DEFAULT 'RUN',
  `IShoc` enum('N','Y') CHARACTER SET latin1 COLLATE latin1_swedish_ci DEFAULT 'N',
  `CurrentCombiners` varchar(28) CHARACTER SET latin1 COLLATE latin1_swedish_ci DEFAULT NULL,
  `RunSetSerla` varchar(25) CHARACTER SET latin1 COLLATE latin1_swedish_ci DEFAULT NULL,
  `JourneySerla` int(11) NOT NULL AUTO_INCREMENT,
  `NominationSerla` varchar(50) CHARACTER SET latin1 COLLATE latin1_swedish_ci DEFAULT NULL,
  `Traction` enum('DSL','AC') CHARACTER SET latin1 COLLATE latin1_swedish_ci NOT NULL DEFAULT 'DSL',
  `Trainload` smallint(4) NOT NULL DEFAULT '0',
  `LeadAssist` enum('Y','N') CHARACTER SET latin1 COLLATE latin1_swedish_ci NOT NULL DEFAULT 'N',
  `DEO` varchar(100) CHARACTER SET latin1 COLLATE latin1_swedish_ci DEFAULT NULL,
  `DEOtime` timestamp NULL DEFAULT CURRENT_TIMESTAMP ON UPDATE CURRENT_TIMESTAMP,
  PRIMARY KEY (`JourneySerla`),
  KEY `trainID` (`trainID`) USING BTREE,
  KEY `routesection_idx` (`runRoute`) USING BTREE,
  KEY `loconumber_idx` (`LocoNumber`) USING BTREE,
  KEY `runsetserla_idx` (`RunSetSerla`) USING BTREE,
  KEY `subservicehead_idx` (`SubServiceHead`) USING BTREE,
  CONSTRAINT `locojourney_ibfk_1` FOREIGN KEY (`SubServiceHead`) REFERENCES `ineffective` (`IneffectiveHead`) ON UPDATE CASCADE,
  CONSTRAINT `locojourney_ibfk_3` FOREIGN KEY (`runRoute`) REFERENCES `routesections` (`Sectionname`) ON DELETE RESTRICT ON UPDATE CASCADE,
  CONSTRAINT `loconumber_fk` FOREIGN KEY (`LocoNumber`) REFERENCES `lococontainer` (`LocoNumber`) ON DELETE CASCADE ON UPDATE CASCADE
) ENGINE=InnoDB AUTO_INCREMENT=345719 DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_0900_ai_ci
Rick James
  • 135,179
  • 13
  • 127
  • 222
balu
  • 81
  • 3
  • You haven't given us enough information to help you. Please read this. http://meta.stackoverflow.com/a/271056/ Please pay special attention to the section on query performance. Then please [edit] your question. – O. Jones Mar 30 '19 at 17:20
  • add also a proper data sample and the expected result – ScaisEdge Mar 30 '19 at 17:20
  • i am worried by the poor performance of the query, what MySQL say is that the mysql is capable of query 14 million records per second, but in my case it just 0.4 million records. – balu Mar 30 '19 at 17:30
  • Proper indexing is required to make any SQL system perform well on tables of more than a few thousand rows. To suggest proper indexing we need to know more about your situation. `FORCE INDEX()` almost never helps, and certainly not on a CTE, which inherently lacks indexes. – O. Jones Mar 30 '19 at 17:33
  • the JOIN is performing on the primary key column, actually which should work without any exceptions – balu Mar 30 '19 at 17:42
  • How many rows does the CTE query give? (In other words, how many distinct values for loconumber?) – Øystein Grøvlen Apr 01 '19 at 10:05
  • sorry for the late reply due to my work conerns, :) the CTE returns 1217 distinct rows – balu Apr 03 '19 at 12:47
  • The query references an alias `x` that is not defined??? – Rick James Apr 19 '19 at 17:53
  • "derived table which is using no index and using where and type ALL" -- Yes. There are no indexes on a derived table, and you need ALL the rows. Anyway, it is only 1099 (or 1217) rows; this is "tiny". – Rick James Apr 19 '19 at 18:03

2 Answers2

0
with LocomotiveLastRun AS(
       -- this block of code runs in 0.016 sec
    SELECT locojourney.loconumber , MAX(locojourney.journeyserla) as lastrunid
      FROM locojourney 
     GROUP BY loconumber)

Why is this CTE subquery fast? Because your table already has an index on (loconumber, journeyserla). (InnoDb automatically appends the primary key to every index.) This query can be satisfied with a loose index scan on that index, and those are fast.

Now for your main query:

  1. Get rid of FORCE INDEX(). Don't even dream of using that unless you have at least a decade of SQL experience or you have read the source code for the InnoDB indexing stuff in MySQL. Notably, it's completely useless on the CTE because CTEs don't have indexes.
  2. For clarity put your main (detail) table first and your CTE second.
  3. For clarity recast the JOIN as a WHERE...IN...

Those three suggestions give us this:

   WITH LocomotiveLastRun AS (...)
 SELECT locojourney.CurrentCombiners , locojourney.JourneySerla , 
        locojourney.From_RunPoint , locojourney.NEXT_RunPoint 
   FROM locojourney
  WHERE journeyserla IN (SELECT lastrunid FROM LocomotiveLastRun)
    AND locojourney.ishoc = 'n'

Now, it's plain what index can help this query.

An index on (ishoc) will help a bit. (It's actually an index, because InnoDB, on (ishoc, journeyserla) so it helps with both WHERE conditions.) The query planner uses BTREE random access to find the first index row with the ishoc value 'n', then scans the values of the primary key to match them with the IN clause.

Instead of that index, a compound index that covers the query will help even more. Such a covering index helps especially because each row of your table is large, with many columns. That index mentions the columns in the WHERE clause and those you want to select, like this:

(ishoc, journeyserla, CurrentCombiners, From_RunPoint, NEXT_RunPoint)

The query planner can satisfy your query entirely from the index, which saves on disk reading time to satisfy the query. If you use your query a lot, this index is a good idea. But, it does consume disk space and slow down INSERT and UPDATE operations a bit.

Read https://use-the-index-luke.com/

O. Jones
  • 103,626
  • 17
  • 118
  • 172
  • the requirement is not more than the 1 sec, since this is an almost real time application, this query run almost 1 time in 30 sec – balu Mar 31 '19 at 06:01
  • may i have an advise regarding the table design in this case, i mean what the professionals would do in real world applications – balu Mar 31 '19 at 06:02
  • Using `IN ( SELECT ...)` is often inefficient. The way you have written the query implies going through all 400K rows, checking each one against the 1217 in the derived table. I would _expect_ this to be inefficient. – Rick James Apr 19 '19 at 18:05
  • "Don't even dream of using FORCE INDEX ..." -- Yeah, it is a nightmare! – Rick James Apr 19 '19 at 18:06
  • If `n` is common for `ishoc`, `INDEX(ishoc)` will not be used. The composite index starting with ishoc might be used. – Rick James Apr 19 '19 at 18:08
0

Give this a try:

SELECT  lj.CurrentCombiners , lj.JourneySerla , lj.From_RunPoint , lj.NEXT_RunPoint
    FROM (  SELECT  MAX(journeyserla) as lastrunid
                FROM  locojourney
                GROUP BY  loconumber 
         ) AS llr
    JOIN  locojourney AS lj  ON llr.lastrunid = lj.journeyserla
    WHERE  lj.ishoc = 'n' 

(time it and provide EXPLAIN for it)

Rick James
  • 135,179
  • 13
  • 127
  • 222