1

I have a table with a billion+ rows. I have have the below query which I frequently execute:

SELECT SUM(price) FROM mytable WHERE domain IN ('com') AND url LIKE '%/shop%' AND date BETWEEN '2001-01-01' AND '2007-01-01';

Where domain is varchar(10) and url is varchar(255) and price is float. I understand that any query with %..% will not use any index. So logically, I created an index on price domain and date:

create index price_date on mytable(price, domain, date)

The problem here persists, this index is also not used because query contains: url LIKE '%.com/shop%'

On the other hand a FULLTEXT index still will not work since I have other non text filters in the query.

How can I optimise the above query? I have too many rows not to use an index.

UPDATE

Is this an sql limit? could such a query provide better performance on a noSQL database?

bcsta
  • 1,963
  • 3
  • 22
  • 61
  • you are using IN for domain; do you sometimes search for more than one? price is not needed to find rows, only for the results, so you should either leave it out of your index or move it to the end. and moving it to the end would only be useful if url was also in the index (though likely it is long enough to not be usefully indexed) – ysth May 21 '21 at 18:26
  • yes I do search for more than one domain. I did move price to the end. The point here is to index url alongside these other columns. Is this a limit on what mysql can do? is the above query better performed on a noSQL database? – bcsta May 21 '21 at 18:29
  • for help optimizing, it helps if you include output of `show create table yourtablename` for all relevant tables and output of `explain select ...` for your query. in this case, it would also help to know what percentage of your billion rows match each of your conditions (domain, url, and date range) – ysth May 21 '21 at 18:30
  • as Bill says, nothing is going to make a fulltext index and a b-tree index be used for the same table. but there is more than likely a way to get acceptable performance, given enough information. any noSQL will still need the right indexing. – ysth May 21 '21 at 18:34
  • is the /shop part arbitrary? or are there some regular set of such strings you search for? – ysth May 21 '21 at 18:34
  • @ysth No it is not arbitrary. I have a set of such string to search for. – bcsta May 21 '21 at 18:51
  • consider using a generated column (e.g. `url_type tinyint as (case when url like '%/shop%' then 1 when url like '%/login%' then 2 else 0 end) not null`) and indexing that? – ysth May 21 '21 at 19:02
  • That might work, although I have different languages for `/shop`, `/login` .. etc. At the same time I am not sure how this generated column effect the INSERT / UPDATE table queries when I add new data periodically. – bcsta May 21 '21 at 19:16
  • you don't need to change any insert/updates. https://dev.mysql.com/doc/refman/8.0/en/create-table-generated-columns.html – ysth May 21 '21 at 19:50

2 Answers2

0

You have two range conditions, one uses IN() and the other uses BETWEEN. The best you can hope is that the condition on the first column of the index uses the index to examine rows, and the condition on the second column of the index uses index condition pushdown to make the storage engine do some pre-filtering.

Then it's up to you to choose which column should be the first column in the index, based on how well each condition would narrow down the search. If your condition on date is more likely to reduce the set of examined rows, then put that first in the index definition.

The order of terms in the WHERE clause does not have to match the order of columns in the index.

MySQL does not support optimizing with both a fulltext index and a B-tree index on the same table reference in the same query.

You can't use a fulltext index anyway for the pattern you are searching for. Fulltext indexes don't allow searches for punctuation characters, only words.

Bill Karwin
  • 538,548
  • 86
  • 673
  • 828
  • I changed my question slightly. I made a mistake earlier. I removed the ">" from query. my main question is how can I use an index for both LIKE and IN as above? It seems I can either use a full text search to optimise the `url` column or use a normal index to optimise the `date` , `price` and `domain`. – bcsta May 21 '21 at 17:30
  • I updated my answer. Fulltext index will not help regardless. – Bill Karwin May 21 '21 at 17:32
  • Thanks. What if I do not have punctuation in my query: `... domain LIKE '%shop%'...`? I am trying to find way to make a compound query (domain, url, date, price) work. The only important thing is that url accepts queries with full wildcards. – bcsta May 21 '21 at 17:37
  • MySQL still cannot use both a fulltext index and a B-tree index on the same table reference in the same query. – Bill Karwin May 21 '21 at 17:43
  • FWIW I've often had good success combining IN and BETWEEN daterange by joining, like `from domains cross join alldates join mytable using (domain, dt) where domains.domain in (...) and alldates.dt between ? and ?`. doesn't help with the fulltext part though. and more likely to be useful where the date range is a smaller percentage of all the dates in the data – ysth May 21 '21 at 18:40
0

I vote for this order:

INDEX(domain,       -- first because of "="
      date,         -- then range
      url, price)   -- "covering"

but, since the constants look like most of the billion rows would be hit, I don't expect good performance.

If this is a common query and/or "shop" is one of only a few possible filters, we can discuss whether a summary table would be useful.

Rick James
  • 135,179
  • 13
  • 127
  • 222