0

I am running innodb 5.6.17. I have a data_blocks table with a column tags of type TEXT. I enabled full text indexing on it so that I can link multiple tag rows to this row. like this:

tags
[16][3]
[18][3]

the numbers are ids that point to tags in the the tags table. a data_block can have a combination of as many or as little tags that they want. I couldn't find a way to link this from a one to many relationship so I created a text column in this format. It works fairly well but it is very slow using

WHERE tags LIKE "%[16]%";

I tried using full text search using

SELECT * FROM data_blocks WHERE MATCH(tags) AGAINST("+[16]");

But this returned 0 results. I already decreased the ft_min_word_len to 3 so that event a one digit id will search but it still returns 0.

If I need to do this a completely different way please tell me, otherwise, why is the full text search not returning any rows?

Thanks

Jason Gallavin
  • 403
  • 4
  • 14
  • By default the minimum word length in full-text search is 3 characters. – Barmar Sep 15 '15 at 03:30
  • You should normalize your design, by using a many-to-many relation table rather than multiple values in one column. – Barmar Sep 15 '15 at 03:31
  • @Barmar How would I implement such a thing. Since I want a datablock to have a dynamic amount of tags linked to it, I'm not sure how to implement it. Because a tag can be used more than one time but the combination of tags used is unique. – Jason Gallavin Sep 15 '15 at 12:15
  • @Barmar For some reason my minimum was 4 when I showed the variable in command line – Jason Gallavin Sep 15 '15 at 12:16
  • You use a table with two columns: `data_block_id` and `tag_id`, and a unique index on the combination of the columns. Google "sql many-to-many relationship" – Barmar Sep 15 '15 at 13:35
  • Okay I drew up a data representation which would be easy going from data_block to tags but not sure how to grab data_block by tag(s), this is the example I looked at: ![example](http://www.tomjewett.com/dbdesign/img/manymany-links.gif) – Jason Gallavin Sep 15 '15 at 20:41
  • You use a 3-way join. There are many examples in questions here, and I'm sure at the tutorial sites as well. – Barmar Sep 15 '15 at 20:44
  • Okay I drew up a data representation which would be easy going from data_block to tags but not sure how to grab data_block by tag(s), this is the example I looked at: http://www.tomjewett.com/dbdesign/img/manymany-links.gif I imagine the data_blocks table being the orders table and the tags table being the products table but using this type of model, it would be very hard to find the data_block you want in specific instances, for example: we have a data_block linked to a tag named income, and then we have a data_block linked to income + january. It would be impossible to just find income. – Jason Gallavin Sep 15 '15 at 20:51
  • My comment got cut off last time, sorry for the confusion – Jason Gallavin Sep 15 '15 at 20:51
  • I'm looking up 3 way joins now – Jason Gallavin Sep 15 '15 at 20:56
  • I see how it works now but how would I ensure I get just one datablock if I have only one tag assigned to that datablock? as stated in the last comment, I need to find a datablock with one tag but other datablocks might have multiple tags assigned to it with one of the tags being that one tag. This approach would pull everyone with that tag which is undesirable. I want to be able to do that yes, but I also need to differentiate. – Jason Gallavin Sep 15 '15 at 21:09
  • Where is that states in your original question? The full-text-search example you showed won't do that, it will find all datablocks that have `[16]`, even if they also have other tags. – Barmar Sep 15 '15 at 21:13
  • Anyway, see http://stackoverflow.com/questions/16704290/how-to-return-rows-that-have-the-same-column-values-in-mysql?lq=1 for how you can restrict results to only rows that match a particular subset of tags, which could be just one tag. – Barmar Sep 15 '15 at 21:13
  • I'm not sure how I would use the sum(id = 2) > 0 or something like it to restrict the subset of tags, could you give an example that more closely matches my situation? – Jason Gallavin Sep 16 '15 at 01:39
  • I used = "[16]" before to restrict the results. instead of LIKE "[16]" – Jason Gallavin Sep 16 '15 at 01:40

2 Answers2

1

If you have a relation table tags, you can do:

SELECT d.*
FROM data_blocks AS d
JOIN tags AS t ON d.id = t.data_block_id
GROUP by d.id
HAVING SUM(t.tag = 16) = COUNT(*)

SUM(t.tag = 16) is the count of the number of rows that have that tag, while COUNT(*) is the total number of rows. So the HAVING clause matches the data blocks where there are exactly as many rows as those having the selected tag.

The tags table should be defined as:

CREATE TABLE tags (
    tag INT,
    data_block_id INT,
    UNIQUE KEY (tag, data_block_id),
    FOREIGN KEY (data_block_id) REFERENCES data_blocks (id)
);
Barmar
  • 741,623
  • 53
  • 500
  • 612
  • what exactly does the HAVING SUM(t.tag = 16) = COUNT(*) do? It seems like you are referencing a one to many relationship. Thanks! – Jason Gallavin Sep 16 '15 at 12:25
  • okay, I think I'm getting it. What happened to the reference table we were talking about. The statement doesn't seem to reference that at all. The tags can be linked to multiple data_blocks but a no two data_blocks can have the same combination of tags. here is an image i made in paint (regrettably) that explains the logic I am going for. I just want to make sure we are on the same page. [link](http://s8.postimg.org/h2v1jiq9h/explaination.jpg) – Jason Gallavin Sep 17 '15 at 04:13
  • `tags` is the relation table. – Barmar Sep 17 '15 at 04:19
  • I've updated the answer to show how this table is created. – Barmar Sep 17 '15 at 04:22
  • The constraint that no two data blocks can have the same combination of tags can't be implemented directly in the DDL. The best you can do is add triggers to check for duplicates. – Barmar Sep 17 '15 at 04:23
  • I can make sure of that in php, before the data_tag is added. I just need to make sure I can grab one data_block with a unique combination or many if a data_block has a specific tag – Jason Gallavin Sep 17 '15 at 04:26
  • wouldn't UNIQUE KEY (tag, data_block_id) make them all have to be different Id's that would defeat the purpose of a many to many relationship wouldn't it? – Jason Gallavin Sep 17 '15 at 04:34
  • No. When you put two columns in the unique key, it means that the combination has to be unique, not each individual column. – Barmar Sep 17 '15 at 04:35
  • oooh that is awesome! I just tested this in phpmyadmin and it is working!!!!!!! before I didn't have the reference table setup with those exact parameters, it was throwing it off somehow and causing it to return 0 results. It will take a lot of studying of this code to fully understand it but It is working. Thank you sir for working day after day with me on this. You are awesome! – Jason Gallavin Sep 17 '15 at 04:50
  • " The constraint that no two data blocks can have the same combination of tags can't be implemented directly in the DDL. The best you can do is add triggers to check for duplicates. " are you talking about when trying to select or just when a person adds a reference? I forgot to ask how to get a unique data_block that has a combination of tags (no specific order). Just like we did for grabbing a data_block with just one tag. is this what you were referring to earlier that can't be done? – Jason Gallavin Sep 17 '15 at 11:07
  • I think I was able to get the desired result with SELECT d.* FROM data_blocks AS d JOIN tags_reference AS t ON d.id = t.data_block_id GROUP by d.id HAVING SUM(t.tag_id = 1) = 1 AND SUM(t.tag_id = 2) = 1 AND COUNT(*) = 2 – Jason Gallavin Sep 17 '15 at 12:40
  • My earlier comment was just talking about preventing the duplicate combinations. You can certainly select for it, and that's one of the ways to do it. – Barmar Sep 17 '15 at 16:07
  • I am implementing it into the system now :) – Jason Gallavin Sep 17 '15 at 17:14
  • After I got all the data imported it still took around 190ms to launch a query for one data_block. What could be slowing it down? I think it is because the table is at over 70k rows. should I try pulling all the rows that have a tag in it and then interpret it with php? Thanks – Jason Gallavin Sep 18 '15 at 04:01
  • Is performance so bad if you remove the `HAVING` clause that checks for a unique tag? – Barmar Sep 18 '15 at 14:02
  • It is the sum statement in the having clause. So I added "where or where" statements that reduced the data set before the "having" clause was activated and that seemed to do the trick. it reduced what the sum function had to handle. I'm not even sure if I need the having clause anymore besides the count(*) – Jason Gallavin Sep 18 '15 at 20:52
  • I suggest you post your solution as an answer so people can see how you solved it efficiently. – Barmar Sep 18 '15 at 20:53
0

The sum function takes quite a while the table gets lager. so To be able to get specific results and not take long would be to reduce the data set with a where statement.

edit from the last answer:

SELECT d.*
FROM data_blocks AS d
JOIN tags AS t ON d.id = t.data_block_id
WHERE t.tag = 16
GROUP by d.id
HAVING COUNT(*) = 1

this way the checking of the tags will reduce the data set before count or sum is called.

If you want to find a specific data_block with certain tags you can do

SELECT d.*
FROM data_blocks AS d
JOIN tags AS t ON d.id = t.data_block_id
WHERE t.tag = 16 OR t.tag = 17
GROUP by d.id
HAVING COUNT(*) = 2

this ensures the tag only has those two tags using the COUNT(*) = (the number of tags the data_block will have)

Jason Gallavin
  • 403
  • 4
  • 14