Let me step through the analysis.
- You are searching for street address that starts with the given string, correct?
WHERE street LIKE 'Count%'
, together with INDEX(street)
is reasonably optimal. No separate table, no JSON, no initial letter, etc.
LIMIT 10
will find the first 10 such items efficiently. Note: Adding an ORDER BY
could wreck performance.
SELECT street
finds the result in the index's B+Tree
. Since this is all you need, use it. SELECT <<some columns>>
or SELECT *
has to do an extra action to get the other column(s).
Such an index will drill down the B+Tree to the first occurrence of "Count...", then it will scan forward from there. Even if the dataset cannot be cached in RAM, there would be effectively only one disk hit, which would take millisecond(s). If you need extra columns, then the extra action would probably require 10 more disk hits (due to LIMIT 10
).
So, it is important that the INDEX
contain exactly what you need to receive for the autosuggest.
ORDER BY
would be OK only if you say ORDER BY street
(or ORDER BY street DESC
). Otherwise, it must find the thousands (millions?) of streets starting with "count", sort this list, then deliver 10.
Based on that analysis, the "minimum of 3 characters" is not important. However, my UI gut feeling is that 3 is reasonable.
Oh, another issue. Since there is probably "1 Main St" in thousands of towns, auto select for "Main" is likely to show 10 copies of the same "1 Main St".
Oh, I have not handled something that you imply from you example. street
includes the street number. Or maybe you have two columns -- street_name
(useful for searching) and street_number
. So...
Have two columns:
- One for autosuggest ("Main St")
- One for display ("1 Main St, Acron OH")
Then have INDEX(auto_suggest, display)
. This will give you the efficiency of minimal effort to get the autosuggest values, but allows equally efficient display of the full address.
But that still has the problem of dup addresses. OK, I am now forced to recommend an extra table. It would contain all the unique street names, no street_number, etc.
CREATE TABLE AutoSuggest (
street VARCHAR(99) NOT NULL,
PRIMARY KEY(street)
) ENGINE=InnoDB;
Note that, in MySQL, a PRIMARY KEY
is a B+Tree index clustered with the data and is UNIQUE
. This table might be about 1GB for 200M addresses, after compensating for dups, overhead, etc. After your app has run for some time, most of the table would be cached in the buffer_pool. This makes the typical autosuggest lookup be about 1 millisecond.
But, that only give you the street name, not the address, etc. Perhaps you want a second column called full_street
(and change to PRIMARY KEY(street, full_street)
):
INSERT INTO AutoSuggest (street, full_street)
VALUES
("Main St", "1 Main St");
This provides most of the benefits and few of the drawbacks mentioned before.
Ponder my comments with the idea of tradeoffs between your two goals:
- Fast autosuggest;
- What the UI presents during the autosuggest.