7

I know that when using a text type field within a MySql table, the data is not stored inline but only a 'pointer' is stored in the row. I only want to infrequently retrieve the text field, so is it better to keep it in the same table but omit it from query results or to keep it in a separate table and join on that table when i want to read it?

This table could potentially have billions of rows, be partitioned and have large (100k -> 1Mb) text field values.

Gary Willoughby
  • 50,926
  • 41
  • 133
  • 199
  • You want to join on the text fields? – Explosion Pills Jul 03 '13 at 13:55
  • Well, if it's only a pointer then I think it's better to be in the same table. as long as you don't select it I don't think it will effect your queries. and if you seperate it's just more "searches" for the db to do before giving you the result. – eric.itzhak Jul 03 '13 at 14:03
  • No joining on the text fields, i mean to join in a table containing just the text field. – Gary Willoughby Jul 03 '13 at 14:06
  • A thought - why store in a table at the first place? you can store a path to a text file and use server side code to display it when needed. – eric.itzhak Jul 03 '13 at 14:49

2 Answers2

2

A billion rows with a field that is 100k is, to say the least, big. That comes to 100 Tbytes of data (using the American definition of "terabyte"). According to the documentation:

The InnoDB storage engine maintains InnoDB tables within a tablespace that can be created from several files. This enables a table to exceed the maximum individual file size. The tablespace can include raw disk partitions, which permits extremely large tables. The maximum tablespace size is 64TB.

In other words, you may have bigger problems than performance. You will probably be spreading the table across multiple partitions.

If you are only occasionally retrieving the text and never using it for searches, I would suggest that you store it in a separate table. That way, you can customize that table for access to these records. You'll have a primary key used for reference and all references will be through that id.

If you are using the text for searches, particularly searches combined with the "fixed" data, then my architectural preference would be to include it in the base table to facilitate the searching across fields.

However, even with this preference, it is probably safer to put it in a different table. For instance, MySQL instantiates subqueries. It is very typical to use * for a subquery. Consider a simple case: a query to get the 1000 most recent records ordered by userid:

select t.*
from (select t.*
      from t
      order by createddate
      limit 1000
     ) t
order by userid

The use of t.* means that the text column would also be retrieved. So a query that might take a fraction of a second (with an index) would have to read and write 1000*100k = 100 Mbytes of data (at least). This would probably take a bit longer.

In conclusion, I would advocate putting the text column in a table where it is often being searched with other columns -- for example, in a database of abstracts of scientific papers. For really large data, I would put it in a separate field, so I could better manage the storage in extreme cases.

Gordon Linoff
  • 1,242,037
  • 58
  • 646
  • 786
0

My take on it:

normally, I would say it is unnecessary complicated to have a reference to a pointer to the text, especially when dealing with multiple joins, potential partitioning etc.

on the other hand, that's quite a monster of a table. If you ever forget to exclude the text field or maybe have someone, that is not well informed about your data structure, working on the same database, who may issue a simple SELECT * FROM monstertable...well, depending on your server, it could kill/stall it for quite some time.

In short: for performance, one single table should be a bit better, for security/stability it may be better to separate.

a sidenode: I would ask myself if MySQL or even a relational database at all is the right tool for this task (and spend endless hours searching for alternatives, get yelled at and just use MySQL, as it is already installed everywhere and well integrated ;) )

cypherabe
  • 2,562
  • 1
  • 20
  • 35