0

I have to run 2 simple queries on a very large table consisting approximately 20 million rows.

Table columns are id|user_id|earned_amount|created_at

Query 1:

select user_id, sum(earned_amount) as total_earning 
from earning_history 
where user_id=XX;

Query 2:

SELECT date(created_at) date, sum(earned_amount) as earning, count(id) as total_entry 
FROM `earning_history` 
where user_id=xx 
GROUP by date

I have to run the second query more than the first one. So Im thinking about indexing the user_id and created_at column;

Without indexing, it takes about 6-7 seconds to perform the second query. My question is,

  1. Should I index only user_id column? Or should I index both user_id and created_at column?

  2. Should I use multiple column index like => ALTER TABLE earning_history ADD INDEX (user_id, created_at); ?

Noob Coder
  • 2,816
  • 8
  • 36
  • 61
  • maybe you can run some test, comparing time with no index, one index on user_id, and two indexes on user_id and created_ad, but i'm not sure index is really usefull for group by field : https://stackoverflow.com/questions/1445071/creating-indexes-for-group-by-fields – ekans Feb 16 '18 at 11:12

2 Answers2

0

You could use a composite index both for accessing fast and reduce the access for data retrieving values . you could use the columns for where (user_id) and the columns used in select ..

but for column use in function or in calculated columns the indexes are normally used

anyqwey you should have some benefits using and index on

create index my_index  on my_table ( user_id, id, created_at, earned_amount)

or

create index my_index  on my_table ( user_id,  created_at,id, earned_amount)
ScaisEdge
  • 131,976
  • 10
  • 91
  • 107
0

Unless id can be NULL, say COUNT(*) instead of id. Then the optimal index for both queries is this order:

INDEX(user_id, earned_amount, created_at)

Both queries will use it as a "covered" index. Both need user_id to be first -- to satisfy the WHERE. The first query will use only the first two columns, with only a minor overhead due to the unused 3rd column. The second query does not care which order the 2nd and 3rd columns are in, I picked this order to make a single INDEX work well for both.

Two separate, single-column, indexes will not be as efficient. MySQL will probably use only one index, and it would be (user_id). It would then have to bounce between the BTree containing the index and the BTree containing all the columns -- in order to get at least earned_amount. "Covering" avoids this bouncing.

Rick James
  • 135,179
  • 13
  • 127
  • 222