11

I have table in MySQL with 10 million rows with 2 GB data selecting IN LIFO format data is slow

Table engine is = InnoDB

table has one primary key and one unique key

SELECT * FROM link LIMIT 999999 , 50;

how I improve the performance of the table. ?

table structure

id  int(11) NO  PRI NULL    auto_increment
url varchar(255)    NO  UNI NULL    
website varchar(100)    NO      NULL    
state   varchar(10) NO      NULL    
type    varchar(100)    NO      NULL    
prio    varchar(100)    YES     NULL    
change  varchar(100)    YES     NULL    
last    varchar(100)    YES     NULL

NOTE: SELECT * FROM link LIMIT 1 , 50; is taking .9ms but current sql is taking 1000ms its 1000 time taking more

Saurabh Chandra Patel
  • 12,712
  • 6
  • 88
  • 78
  • Do you need all the columns from the table?Can you elaborate on slowness- How long is it taking? Also what is the data type of the columns? – Sanj Mar 24 '16 at 03:57
  • can you run below query and paste the result `EXPLAIN SELECT * FROM link LIMIT 999999 , 50;` – Gaurav Lad Mar 24 '16 at 04:01
  • its nearly take 1 sec but when no of row will be more bigger then time fetching time will be more SELECT * FROM link LIMIT 1 , 50; is taking .9ms but current sql is taking 1000ms its 100 time taking more. – Saurabh Chandra Patel Mar 24 '16 at 04:21
  • 1,000,000,000,000 rows stored in 5 GB, that's 200 rows per byte, amazing. So what's the actual number/sizer? – dnoeth Mar 28 '16 at 17:13
  • 10005491 row with ~2GB but growing very fast. so – Saurabh Chandra Patel Mar 28 '16 at 17:23
  • Have a look at [Pagination Done the Right Way](http://use-the-index-luke.com/blog/2013-07/pagination-done-the-postgresql-way). He is using Postgres for illustration, but the main idea is to use a suitable index, which should be applicable for MySQL as well. – Vladimir Baranov Mar 29 '16 at 00:10
  • I've edited the title of this question, and the first sentence.. seems like you are confused about what a trillion is. Also much better to ask the question as the problem is and include additional concerns (It is growing) part in the details rather than exaggerate. – Arth Mar 29 '16 at 11:52
  • What is the cardinality of website and url? What is the cardinality of state,type,prio? What is stored in change and last? Dates? – joop Mar 29 '16 at 11:53
  • 2
    And you are really sitting there clicking on the [Next] button 20,000 times? I think not. And without an `ORDER BY`, you can't be guaranteed to get any particular 50 rows! – Rick James Mar 30 '16 at 00:20
  • some how I have to access data from DB from any reason – Saurabh Chandra Patel Mar 30 '16 at 12:36

5 Answers5

7

This most likely is due to "early row lookup"

MySQL can be forced to do "late row lookup". Try below query

SELECT  l.*
FROM    (
        SELECT  id
        FROM    link
        ORDER BY
                id
        LIMIT 999999 , 50
        ) q
JOIN    link l
ON      l.id = q.id

Check this article

MySQL limit clause and rate low lookups

Sanj
  • 3,879
  • 2
  • 23
  • 26
  • 2
    I didn't believe that, until testing it myself. In my test it was 50% performance win. Why MySQL still can not optimize such queries without help? – Paul Spiegel Apr 02 '16 at 15:21
  • @PaulSpiegel: It was a surprise for me too when I first encountered it. – Sanj Apr 04 '16 at 18:25
6

For the Next and Prev buttons you can use a WHERE clause instead of OFFSET.

Example (using LIMIT 10 - Sample data explained below): You are on some page which shows you 10 rows with the ids [2522,2520,2514,2513,2509,2508,2506,2504,2497,2496]. This in my case is created with

select *
from link l
order by l.id desc
limit 10
offset 999000

For the next page you would use

limit 10
offset 999010

getting rows with ids [2495,2494,2493,2492,2491,2487,2483,2481,2479,2475].

For the previous page you would use

limit 10
offset 998990

getting rows with ids [2542,2541,2540,2538,2535,2533,2530,2527,2525,2524].

All above queries execute in 500 msec. Using the "trick" suggested by Sanj it still takes 250 msec.

Now with the given page with minId=2496 and maxId=2522 we can create queries for the Next and Last buttons using the WHERE clause.

Next button:

select *
from link l
where l.id < :minId -- =2496
order by l.id desc
limit 10

Resulting ids: [2495,2494,2493,2492,2491,2487,2483,2481,2479,2475].

Prev button:

select *
from link l
where l.id > :maxId -- =2522
order by l.id asc
limit 10

Resulting ids: [2524,2525,2527,2530,2533,2535,2538,2540,2541,2542].

To reverse the order you can use the query in a subselect:

select *
from (
    select *
    from link l
    where l.id > 2522
    order by l.id asc
    limit 10
) sub
order by id desc

Resulting ids: [2542,2541,2540,2538,2535,2533,2530,2527,2525,2524].

These queries execute in "no time" (less than 1 msec) and provide the same result.

You can not use this solution to create page numbers. But i don't think you are going to output 200K page numbers.

Test data:

Data used for the example and benchmarks has been created with

CREATE TABLE `link` (
    `id` INT(11) NOT NULL AUTO_INCREMENT,
    `url` VARCHAR(255) NOT NULL,
    `website` VARCHAR(100) NULL DEFAULT NULL,
    `state` VARCHAR(10) NULL DEFAULT NULL,
    `type` VARCHAR(100) NULL DEFAULT NULL,
    `prio` VARCHAR(100) NULL DEFAULT NULL,
    `change` VARCHAR(100) NULL DEFAULT NULL,
    `last` VARCHAR(100) NULL DEFAULT NULL,
    PRIMARY KEY (`id`),
    UNIQUE INDEX `url` (`url`)
) COLLATE='utf8_general_ci' ENGINE=InnoDB;

insert into link
    select i.id
        , concat(id, '-', rand()) url
        , rand() website
        , rand() state
        , rand() `type`
        , rand() prio
        , rand() `change`
        , rand() `last`
    from test._dummy_indexes_2p23 i
    where i.id <= 2000000
      and rand() < 0.5

where test._dummy_indexes_2p23 is a table containing 2^23 ids (about 8M). So the data contains about 1M rows randomly missing every second id. Table size: 228 MB

Paul Spiegel
  • 30,925
  • 5
  • 44
  • 53
0

I have updated my SQL Query to this and this is taking less amount of time.

 SELECT * FROM link ORDER BY id LIMIT 999999 , 50  ;
Saurabh Chandra Patel
  • 12,712
  • 6
  • 88
  • 78
0

Due to the large amount of data,

There are few tips for improve the query response time:

  1. Change the storage engine Innodb to myisam.
  2. Create table partitioning (https://dev.mysql.com/doc/refman/5.7/en/partitioning-management.html)
  3. Mysql cluster (http://dev.mysql.com/doc/refman/5.7/en/mysql-cluster-overview.html)
  4. Increase hardware capacity.

Thanks

kamlesh
  • 38
  • 4
0

First of all running on your table without any order doesn't guaranty your query will return the same data if ran twice. It's better adding an ORDER BY clause. Taking id as a good candidate, as it's your primary key and seems unique (as it's an auto_increment value).

You could use this as your base:

SELECT * FROM link ORDER BY id LIMIT 50;

This will give you the first 50 rows in your table.

Now for the next 50 rows, instead of using OFFSET, we could save our last location in the query.

You would save the id from the last row last id from the previous query and use it in the next query:

SELECT * FROM link WHERE id > last_id ORDER BY id LIMIT 50;

This will give you the next 50 rows after the last id.

The reason your query runs slowly on high values of OFFSET is because mysql has to run on all rows in the given OFFSET and return the last LIMIT number of rows. This means that the bigger OFFSET is the slower the query will run.

The solution I showed above, doesn't depend on OFFSET, thus the query will run at the same speed independent of the current page.

See also this useful article that explains a few other options you can choose from: http://www.iheavy.com/2013/06/19/3-ways-to-optimize-for-paging-in-mysql/

Doody P
  • 4,611
  • 2
  • 20
  • 18