0

I need the fastest performance way to retrieve integers that need to be pulled up fast. Here is my use case: There is a "sort/filter" option on my site. I have attributes for dogs, like long hair, short hair, big, dog, etc... I want users to be able to filter by attributes of the dogs.

Here is my table:

dogs
- id
- attributes

Here is the reference to attributes about the dogs (not very important for the question, it is just to give you an idea):

0 = adoptable pet
1 = short hair
2 = long hair
3 = fluffy
4 = black
5 = brown
6 = white
7 = golden/yellow
8 = etc...

Right now, I have attributes saved in a TEXT data type in JSON format. For example, ["0","4"] or ["0"] or ["2"].

So I want to select all the attributes like so:

SELECT * FROM dogs WHERE attributes LIKE %0% OR attributes LIKE %1% OR attributes LIKE %4% attributes LIKE %7%

How can I select all rows where the attributes contain 0, or 1, or 4, or 7 (not and). If a row has one of those attributes, select them.

What is the best way to approach it? Should I store as JSON datatype in MySQL or should I do something else? I want to be able to SELECT and pull it up FAST. I feel like the way I have it LIKE won't be very fast.

What is the the best and fastest way especially with hundreds of thousands of rows to millions?

Thank you in advanced!

pixie123
  • 929
  • 3
  • 14
  • 27
  • 1
    Just create an (primary) index for the id column and use something like `WHERE id IN (0,1,4,7)` – Daan Feb 11 '19 at 20:47
  • 1
    `like` is a wild-card search, dont know why you would want that. –  Feb 11 '19 at 20:51
  • What version of MySQL are you using? – Nick Feb 11 '19 at 20:57
  • @Daan I'm not trying to find by the ID, i'm trying to find by the attribute. And in the `attribute` column, there is a datatype like `["0","4"]` (in JSON) which the numbers are the ID's of the attributes. I need to search for the ID's of the attributes in the attributes column – pixie123 Feb 11 '19 at 21:00
  • @tim I can do `LIKE %"0"%` right? – pixie123 Feb 11 '19 at 21:00
  • @Nick I'm using the latest version of MySQL – pixie123 Feb 11 '19 at 21:01
  • We do realize (don't we?) that `LIKE '%4%'` will also match `14` and `42`. For non trivial sets, create a table `dog_attribute (dog_id` *(FK ref dog.id)* `, attr INT NOT NULL, PRIMARY KEY (dog_id, attr))` – spencer7593 Feb 11 '19 at 21:01
  • Although not recommended, you could store the list of attributes as a comma separated list and use [`find_in_set()`](https://dev.mysql.com/doc/refman/8.0/en/string-functions.html#function_find-in-set) – Nigel Ren Feb 11 '19 at 21:06
  • comma separated lists are a problematic antipattern. Bill Karwin gives an excellent presentation in Chapter 2 of his book SQL AntiPatterns: Avoiding the Pitfalls ... available at Amazon and other fine booksellers https://www.amazon.com/SQL-Antipatterns-Programming-Pragmatic-Programmers/dp/1934356557 – spencer7593 Feb 11 '19 at 21:16
  • 2
    Also see my answer to https://stackoverflow.com/questions/3653462/is-storing-a-delimited-list-in-a-database-column-really-that-bad/3653574#3653574 I consider a JSON array to have all the same disadvantages as a comma-separated list. – Bill Karwin Feb 11 '19 at 21:31
  • Possible duplicate of [Is storing a delimited list in a database column really that bad?](https://stackoverflow.com/questions/3653462/is-storing-a-delimited-list-in-a-database-column-really-that-bad) – philipxy Feb 11 '19 at 22:53

1 Answers1

3

Comma separated lists are a problematic antipattern. (Bill Karwin gives an excellent presentation in his book SQL AntiPatterns: Avoiding the Pitfalls of Database Programming, so I'm not going to delve into a discussion of why we want to avoid that. Instead, I'm going to jump into a proven pattern commonly used to implement multi-valued attributes.)


Normative relational pattern would be create a table with a single attribute on each row. For example something like this:

CREATE TABLE dog_attribute 
( dog_id  int not null comment 'fk ref dog.id'
, attr    int not null comment '1=adoptable, 2=...'
, PRIMARY KEY (dog_id, attr) 
, CONSTRAINT FK_dog_attribute_dog FOREIGN KEY dog_id REFERENCES dog (id)
  ON UPDATE CASCADE ON DELETE CASCADE
, UNIQUE KEY dog_attribute_UX (attr, dog_id)
);

--

To add three attributes for a given dog_id, we add three rows:

INSERT INTO dog_attibute (dog_id, attr) VALUES
 (1,2)
,(1,3)
,(1,5)

--

There are several possible query patterns. As a few quick examples:

To find dog_id that has both attribute 2 and attribute 3, we could do this:

SELECT da.dog_id 
  FROM dog_attribute da
 WHERE da.attr IN (2,3) 
 GROUP
    BY da.dog_id 
HAVING COUNT(1) = 2 

There are other query patterns that give an equivalent result, e.g.

SELECT da.dog_id 
  FROM dog_attribute da
  JOIN dog_attribute dd
    ON dd.dog_id = da.dog_id 
   AND dd.attr = 3
 WHERE da.attr = 2

The first query pattern could be extended to find dog_id that have at least two of a given set of attributes. e.g.

 WHERE da.attr IN (2,3,5,7)

HAVING COUNT(1) >= 2

The second query can be extended to add joins and anti-joins for individual attributes,

--

To find rows that have attribute 2, and have either (or both) attribute 3 and 4, and has neither attribute 7 or 8, we could do something like this:

SELECT da.dog_id 
  FROM dog_attribute da
  LEFT
  JOIN dog_attribute dn 
    ON dn.dog_id = da.dog_id 
   AND dn.attr IN (7,8) 
 WHERE dn.dog_id IS NULL 
   AND da.attr = 2
   AND EXISTS ( SELECT 1 
                  FROM dog_attribute de
                 WHERE de.dog_id = da.dog_id 
                   AND de.attr IN (3,4)
              )

(anti-join to exclude dog_id that have attribute 7, ...)


If I could not implement a table pattern as demonstrated above, ... if I absolutely had to use comma separated lists, to hold multiple attributes in a single column... for a small, static set of attributes (defined within the table definition, not dynamically added or removed) I would use the MySQL SET datatype.

https://dev.mysql.com/doc/refman/8.0/en/set.html

 attributes  SET('','adoptable','short hair','long hair','fluffy','black','brown','white','golden')

and to query that, we could do:

WHERE FIND_IN_SET('black',t.attributes)
  AND FIND_IN_SET('long hair',t.attributes)
spencer7593
  • 106,611
  • 15
  • 112
  • 140