1

If a SQL Server 2012 table has 2 varchar(max) columns that are rarely used and causing a table to grow rapidly, does it make sense to split into a separate table?

Data in the two varchar(max) columns is used to store raw JSON responses from an API.

If stored in a separate table, rows would be truncated to only include previous 6 months, reducing table size of first table.

marc_s
  • 732,580
  • 175
  • 1,330
  • 1,459
user2966445
  • 1,267
  • 16
  • 39
  • Yes that would make a lot of sense for a number of reasons. If you have transient data like that it does not belong in permanent/historical tables, – TheMadDBA Jun 18 '15 at 14:39
  • "Makes sense" does not equal "should be done". We have no idea of your implementation, resource available, scale etc. Only you can determine that. – DavidG Jun 18 '15 at 14:40
  • DavidG, what specifically would you need to know to make a recommendation? – user2966445 Jun 18 '15 at 14:43

3 Answers3

4

If stored in a separate table, rows would be truncated to only include previous 6 months, reducing table size of first table.

The rows would have to be deleted, not truncated, and then the BLOB space would have to be reclaimed by running ALTER INDEX ... REORGANIZE WITH (LOB_COMPACTION = ON)

If you'd store instead the blobs in the original table, you would have to update the rows to SET blob = NULL and then reclaim the space with ALTER INDEX ... REORGANIZE WITH (LOB_COMPACTION = ON)

So when it boils down to the details, you aren't achieving much using a split table, imho. So I stick to my earlier advice from SQL Server varbinary(max) and varchar(max) data in a separate table: I see no benefits in split, but I see trouble from having to maintain the rows consistent 1:1 between the splits.

You may have a case if you split and partition the 'blobs' table. Then you could, indeed, deallocate very efficiently the old space by switching 'out' the old partition and replacing them with an empty one, and then dropping the switched out data. That is something to consider. Of course, you code would have to be smart enough when it joins the two 'splits' to consider that the blobs may be gone (ee. use OUTER JOIN).

Community
  • 1
  • 1
Remus Rusanu
  • 288,378
  • 40
  • 442
  • 569
  • Any concerns with partitioning the table in place considering current size is 250GB+? Currently uses the Primary filegroup along with all other tables in the DB. – user2966445 Jun 18 '15 at 14:52
  • You only need to partition the 'blobs' table. To build up this blob table you need some procedure to copy over the blobs, which is going to be slow and offline (require service downtime, unavailability). When you do this split you can easily split directly into a partitioned table, no need to partition afterwards. – Remus Rusanu Jun 18 '15 at 14:59
  • If we went the first approach (leave table alone), any concerns with in-place partitioning? – user2966445 Jun 18 '15 at 15:09
  • You mean rebuild the table on a new partitioning scheme? Is a rebuild, you need to account for the rebuild downtime, and need to provision the required disk space for data and log during the rebuild. But why would you partition the *entire* (all columns) table? – Remus Rusanu Jun 18 '15 at 15:11
  • Because it's huge and on the same filegroup as all other tables. Thought it would make sense to break into smaller files on a separate filegroup. – user2966445 Jun 18 '15 at 15:13
  • You won't gain much, but pay a pretty big price. Partitioning has a big impact, all the way up to your app model (it changes primary keys etc). Read http://www.brentozar.com/archive/2012/03/how-decide-if-should-use-table-partitioning/ – Remus Rusanu Jun 18 '15 at 15:14
  • Spreading on different filegroups doesn't buy anything unless the filegroups are on different physical disks.. which can be achieved just as easily by adding multiple files, one for each disk, to the filegroup the table is in now. – Remus Rusanu Jun 18 '15 at 15:16
  • Let us [continue this discussion in chat](http://chat.stackoverflow.com/rooms/80940/discussion-between-user2966445-and-remus-rusanu). – user2966445 Jun 18 '15 at 21:31
0

Just to give you a hint. You can use SPARSE on a rarely used column.

Example:

CREATE TABLE myTable (
    id int identity(1,1),
    name nvarchar(100),
    blob nvarchar(max) SPARSE,
    blob2 nvarchar(max) SPARSE
)

Sparse will just left a small marker inside the page. But it's mostly bad practice to have nvarchar(max). Is it really needed?

You can read more about it here.

Ionic
  • 3,884
  • 1
  • 12
  • 33
0

This question is an almost-duplicate, but not quite.

To a large extent, the varchar(max) columns are stored off-page. If you do not refer to them in a query, then they are not loaded. So, they provide little additional overhead.

The situation is different if the values are stored on the same page. Then removing them can shrink the size of your table and fewer pages should be faster for select queries. Here is an interesting blog on this subject.

You have an additional constraint, in terms of truncating the table. That would seem like a win for many queries, so I think that tips the argument toward putting the values in a separate table.

Note, however, that truncation is not the only possible solution. You could also partition the data (say by month or week). If the partitioning key is part of all queries, then you can use partitioning instead of truncation and also reduce the effective size of the table.

Community
  • 1
  • 1
Gordon Linoff
  • 1,242,037
  • 58
  • 646
  • 786