-1

enter image description here

Hello

I want to configure a partition (monthly)/subpartition (day by day) as the query above. If the total number of subpartitions exceeds 64, '(errno: 168 "Unknown (generic) error from engine")' The table is not created due to an error. (Creating less than 64 is successed).

I know that the maximum number of partitions (including subpartitions) that can be created is 8,192, is there anything I missed?

Below is the log table.

create table detection_log
(
    id bigint auto_increment,
    detected_time datetime default  '1970-01-01' not null,
    malware_title varchar(255) null,
    malware_category varchar(30) null,
    user_name varchar(30) null,
    department_path varchar(255) null,
    PRIMARY KEY (detected_time, id),
    INDEX `detection_log_id_uindex` (id),
    INDEX `detection_log_malware_title_index` (malware_title),
    INDEX `detection_log_malware_category_index` (malware_category),
    INDEX `detection_log_user_name_index` (user_name),
    INDEX `detection_log_department_path_index` (departmen`enter code here`t_path)
);
김정호
  • 3
  • 4

1 Answers1

0
  • SUBPARTITIONs provide no benefit that I know of.
  • HASH partitioning either provides no benefit or hurts performance.

So... Explain what you hoped to gain by partitioning; then we can discuss whether any type of partitioning is worth doing. Also, provide the likely SELECTs so we can discuss the optimal INDEXes. If you need a "two-dimensional" index, that might indicate a need for partitioning (but still not subpartitioning).

More

I see PRIMARY KEY(detected_time,id). This provides a very fast way to do

SELECT ...
    WHERE detected_time BETWEEN ... AND ...
    ORDER BY  detected_time, id

In fact, it will probably be faster than if you also partition the table. (As a general rule it is useless to partition on the first part of the PK.)

If you need to do

SELECT ...
    WHERE user_id = 123
      AND detected_time BETWEEN ... AND ...
    ORDER BY  detected_time, id

Then this is optimal:

INDEX(user_id, detected_time, id)

Again, probably faster than any form of partitioning on any column(s).

And

A "point query" (WHERE key = 123) takes a few milliseconds more in a 1-billion-row table compared to a 1000-row table. Rarely is the difference important. The depth of the BTree (perhaps 5 levels vs 2 levels) is the main difference. If you PARTITION the table, you are removing perhaps 1 or 2 levels of the BTree, but replacing them with code to "prune" down to the desired partition. I claim that this tradeoff does not provide a performance benefit.

A "range query" is very nearly the same speed regardless of the table size. This is because the structure is actually a B+Tree, so it is very efficient to fetch the 'next' row.

Hence, the main goal in optimizing queries on a huge table is to take advantage of the characteristics of the B+Tree.

Pagination

SELECT  log.detected_time, log.user_name, log.department_path,
        log.malware_category, log.malware_title
    FROM  detection_log as log
    JOIN  
    (
        SELECT  id
            FROM  detection_log
            WHERE  user_name = 'param'
            ORDER BY  detected_time DESC
            LIMIT  25 OFFSET 1000 
    ) as temp  ON temp.id = log.id;

The good part: Finding ids, then fetching the data.

The slow part: Using OFFSET.

Have this composite index: INDEX(user_name, detected_time, id) in that order. Make another index for when you use department_path.

Instead of OFFSET, "remember where you left off". A blog specifically about that: http://mysql.rjweb.org/doc.php/pagination

Purging

Deleting after a year is an excellent use of PARTITIONing. Use PARTITION BY RANGE(TO_DAYS(detected_time)) and have either ~55 weekly or 15 monthly partitions. See HTTP://mysql.rjweb.org/doc.php/partitionmaint for details. DROP PARTITION is immensely faster than DELETE. (This partitioning will not speed up SELECT.)

Rick James
  • 135,179
  • 13
  • 127
  • 222
  • thank you for your reply I added an explanation to the text. – 김정호 Jan 15 '21 at 04:05
  • Thank you for your reply. I am currently making a web service that accumulates more than 100 million logs in a year. Logs should be provided to users on a web page in the form of a pageable table. – 김정호 Jan 15 '21 at 04:14
  • The user should be able to view log data in the desired date period on the web page. For that reason, we tried to implement daily partitions with subpartitioning. Subpartitioning was considered for efficiency of search and alignment. – 김정호 Jan 15 '21 at 04:15
  • This is a new employee assignment for the company. There is a requirement that the log table should be partitioned monthly. It was determined that the monthly partitioning would result in up to 8 million rows per partition, which would reduce the pageation performance. So I considered day-to-day subpartitioning. – 김정호 Jan 15 '21 at 04:15
  • @김정호 - Will you be deleting "old" log rows? Beware -- If a user needs to "page" through more than a few dozen pages, he will get very tired; even angry at the UI. Will the page contain all the rows for a certain time range? Or user_id + time range? Or what? See also http://mysql.rjweb.org/doc.php/partitionmaint – Rick James Jan 15 '21 at 06:30
  • @김정호 - Is that 100 million log _rows_ per year? Or 100 million log _files_, each of, say, 100 thousand rows? How many _rows_ need to be inserted per second (or per day)? 1K rows/second is not difficult; 10K/second may need some tricks. – Rick James Jan 15 '21 at 06:42
  • @김정호 - How much disk space will be required? Is it practical to 'normalize' any of those columns? Smaller --> faster. Yes, it complicates the INSERTs, but I have some techniques for that, too. And it speeds up the SELECTs. – Rick James Jan 15 '21 at 06:44
  • @김정호 - And, I have added more to my Answer. – Rick James Jan 15 '21 at 06:57
  • Up to 100 million log rows are stored in a year. A row of not more than 1k per second will be stored. The preservation period is one year. In other words, rows one year after the current time are deleted. I will refer to the link you sent me. Thank you. The page contains rows of all periods corresponding to a particular "department_path" or "username". Of course, only 25 are shown on a page and the user paged through a button. As you said, as the page moves backward, the response speed slows down. An example of the paging query I wrote is as follows. – 김정호 Jan 15 '21 at 07:39
  • `code` SELECT log.detected_time, log.user_name, log.department_path, log.malware_category, log.malware_title FROM detection_log as log JOIN ( SELECT id FROM detection_log WHERE user_name = 'param' ORDER BY detected_time DESC LIMIT 25 OFFSET 1000 ) as temp on temp.id = log.id; – 김정호 Jan 15 '21 at 07:39
  • @김정호 - I added more to my Answer. Since the partitioning will be by datetime, but the `SELECT` will be by `datetime > $left_off`, there will be no "partition pruning". Hmmm, I need to do some research. The thing I am worried about... Although the index is 'perfect' for getting the 25 rows for the page when using a non-partitioned table, I don't know how well it will work when it may need to move on to more partitions. – Rick James Jan 15 '21 at 16:58
  • @김정호 - I tried a similar select in a similar partitioned table. I _think_ it opened all the partitions, then did a "merge" (a la sort-merge) to find the "next" 25 rows. Conclusion: a slight performance benefit in having fewer partitions. That is, I would recommend monthly, not weekly, partitioning. Here's the tool I used: http://mysql.rjweb.org/doc.php/index_cookbook_mysql#handler_counts – Rick James Jan 15 '21 at 17:42
  • Thank you very much for your great answer. Based on your advice, I will think again and apply it. It was my first time using Stack Overflow and I was lucky to meet a kind person like you. – 김정호 Jan 16 '21 at 01:24