3

I have a table that has 124,387,133 rows each row has 59 columns and of those 59, 18 of the columns are TinyInt data type and all row values are either 0 or 1. Some of the TinyInt columns are used in indexes and some are not.

My question will it make a difference on query performance and table size if I change the tinyint to a bit?

4 Answers4

4

In case you don't know, a bit uses less space to store information than a TinyInt (1 bit against 8 bits). So you would save space changing to bit, and in theory the performance should be better. Generally is hard to notice such performance improvement but with the amount of data you have, it might actually make a difference, I would test it in a backup copy.

Serr
  • 303
  • 1
  • 10
  • 1
    This isn't necessarily true at all. By default a tinyint and a bit are precisely the same size: one byte (8 bits). You can't store a single bit in memory; it can't be uniquely addressed. However SQL Server will bitfield-compress multiple adjacent bit columns, so 1-8 adjacent bit columns will be compressed into 1 byte. – Rab Jan 20 '21 at 14:28
2

Actually,it's good to use the right data type..below are the benefits i could see when you use bit data type

1.Buffer pool savings,page is read into memory from storage and less memory can be allocated

2.Index key size will be less,so more rows can fit into one page and there by less traversing

Also you can see storage space savings as immediate benefit

TheGameiswar
  • 27,855
  • 8
  • 56
  • 94
  • 2
    A page is going to occupy the same space in memory regardless, its an 8k page - I think you mean less pages might be read into memory due to higher row density per page. Index size applies only if the field is within the index of course. – Andrew Oct 24 '17 at 15:40
1

In theory yes, in practise the difference is going to be subtle, the 18 bit fields get byte packed and rounded up, so it changes to 3 bytes. Depending on nullability / any nullability change the storage cost again will change. Both types are held within the fixed width portion of the row. So you will drop from 18 bytes to 3 bytes for those fields - depending on the overall size of the row versus the page size you may squeeze an extra row onto the page. (The rows/page density is where the performance gain will show up primarily, if you are to gain)

This seems like a premature micro-optimization however, if you are suffering from bad performance, investigate that and gather the evidence supporting any changes. Making type changes on existing systems should be carefully considered, if you cause the need for a code change, which prompts a full regression test etc, the cost of the change rises dramatically - for very little end result. (The production change on a large dataset will also not be rapid, so you can factor in some downtime in the cost as well to make this change)

Andrew
  • 26,629
  • 5
  • 63
  • 86
  • Nullability doesn't affect the stored. SQL Server stores an array of nullable bits for all fields, regardless of the metadata information on nullability. – Gordon Linoff Oct 24 '17 at 16:16
  • You are right, as long as at least 1 field is nullable it stores the bitmap. – Andrew Oct 24 '17 at 21:20
1

You would be saving about 15 bytes per record, for a total of 1.8 Gbytes.

You have 41 remaining fields. If I assume that those are 4-byte integers, then your current overall size is about 22 Gbytes. The overall savings is less than 10% -- and could be much less if the other fields are larger.

This does mean that a full table scan would be about 10% faster, so that gives you a sense of the performance gain and magnitude.

I believe that bit fields require an extra operation or two to mask the bits and read -- trivial overhead that is measured in nanoseconds these days -- but something to keep in mind.

The benefits of a smaller page size are that more records fit on a single page, so the table occupies less space in memory (assuming all is read in at once) and less space on disk. Smaller data does not always mean improved query performance. Here are two caveats:

  1. If you are reading a single record, then the entire page needs to be read into cache. It is true that you are a bit less likely to get a cache miss with a warm cache, but overall reading a single record from a cold cache would be the same.
  2. If you are reading the entire table, SQL Server actually reads pages in blocks and implements some look-ahead (also called read-ahead or prefetching). If you are doing complex processing, you might not even notice the additional I/O time, because I/O operations can run in parallel with computing.

For other operations such as deletes and updates, locking is sometimes done at the page level. In these cases, sparser pages can be associated with better performance.

Gordon Linoff
  • 1,242,037
  • 58
  • 646
  • 786