I used to think that SQL cannot process unstructured data (like text) unless we write some user-defined functions in C. However, InnoDB's FullText Search feature seems did much of such work already.
According to https://dev.mysql.com/doc/refman/5.6/en/innodb-fulltext-index.html, the index is saved in InnoDB tables named FTS_00000..._00000..._INDEX_?
.
I tried to run SELECT * FROM FTS_00000..._00000..._INDEX_1, in the hope to see tokens in each document (perhaps with stopwords removed already). However, I got an error message
ERROR 1146 (42S02): Table 'tf.FTS_0000000000000028_0000000000000030_INDEX_1' doesn't exist
even if select * from information_schema.INNODB_SYS_TABLES;
reveals that the table exists.
Does anyone know how I could get the tokens of each document I inserted into the full-text index? It would be great if I can get the information in the following data schema:
token_id document_id count
"apple" 103343 3
"orange" 9593 1
...