0

I have this table

text

| txt_id | txt_content                                  |
|--------+----------------------------------------------|
|      1 | A ton of text and <<this>>                   |
|      2 | More text <<and>> that                       |
|      3 | <<Very>> much <<text>> enough for<everyone>> |

And this table

tags

| tag_id | tag_name |
|--------+----------|
|      1 | THIS     |
|      2 | AND      |
|      3 | VERY     |
|      4 | TEXT     |
|      5 | EVERYONE |

And I need a query to yield this table.

| txt_id | tag_id |
|--------+--------|
|      1 |      1 |
|      2 |      2 |
|      3 |      3 |
|      3 |      4 |
|      3 |      5 |

It would be tivial to do with python code by getting each piece of text inidividually but the text table has many rows (>30M) and I think it would bee to much time on the database-backend communication. Is there a way to do this kind of thing with MySQL? I would even be content with

| txt_id | tag_id   |
|--------+----------|
|      1 | this     |
|      2 | and      |
|      3 | Very     |
|      3 | text     |
|      3 | everyone |

But that I expect the last part to be easy to do within MySQL.

Taryn
  • 242,637
  • 56
  • 362
  • 405
fakedrake
  • 6,528
  • 8
  • 41
  • 64

1 Answers1

2

This won't necessary be particularly fast, but it will do what you want:

select t.txt_id, ta.tag_id
from text t join
     tags ta
     on t.txt_content like concat('%<', ta.tag_name, '>%');
Gordon Linoff
  • 1,242,037
  • 58
  • 646
  • 786
  • You'll need wildcards. – Clockwork-Muse Aug 10 '14 at 23:08
  • This has also a side-effect I like of only including unique tags of each text. Out of curosity, how would you go about doing finding duplicates as well? – fakedrake Aug 11 '14 at 14:43
  • @fakedrake . . . I'm not sure I know what you mean. If you want to add another column to your table that is a counter of the number of times that tags appear in a text, then that is another question. – Gordon Linoff Aug 11 '14 at 15:47
  • Consider the case where the first row of `text` would be `<> is a ton of text and <>`. Then that entry should produce the row `(1, THIS)` twice. With the solution presented it will produce that row only once. Am I wrong? – fakedrake Aug 11 '14 at 23:11
  • @fakedrake . . . This will only provide one row per tag/text combination. I could imagine a way to get a count. Getting multiple rows is a little bit harder. – Gordon Linoff Aug 11 '14 at 23:58