1

I have some business that needs to be run on a daily basis and will be affecting all the rows in the tables. Once a record is fixed and can't change again by the logic it gets moved to an on disk table. At its max there will end up being approximately 30 million rows in the table. Its very skinny, just the linkage items to a main table and a key to a flag table. The flag key is what will be updated.

My question is when I'm preparing a table of this size which size bucket count should I be looking to use on the index?

The table will start off small with likely only a few hundred thousand rows in April, but come the end of the financial year it will ave grown to the maximum mentioned as previous years have indicated and I'm not sure if this practically empty bucket at the start will have any issues or if it is ok to have the count at the 30 million mark.

thanks in advance you comments, suggestion and help.

I've provided the code below and I've tried googling what occurs if the bucket count is high but the intial number of rows is low as the table grows over time but found nothing to help me understand if there will be a performance issue because of this.

CREATE TABLE [PRD].[CTRL_IN_MEM]
(
    [FILE_LOAD_ID] INT NOT NULL,
    [RECORD_IDENTIFIER] BIGINT NOT NULL,
    [FLAG_KEY] SMALLINT NOT NULL,
    [APP_LEVEL_PART] BIT NOT NULL

--Line I'm not sure about
    CONSTRAINT [pk_CTRL_IN_MEM] PRIMARY KEY NONCLUSTERED HASH ([FILE_LOAD_ID], [RECORD_IDENTIFIER]) WITH (BUCKET_COUNT = 30000000),
    INDEX cci_CTRL_IN_MEM CLUSTERED COLUMNSTORE

) WITH (MEMORY_OPTIMIZED = ON, DURABILITY=SCHEMA_AND_DATA)
user1663003
  • 149
  • 1
  • 10
  • Why are you using an in-memory table in the first place? They are meant to handle high-traffic scenarios, not "accelerate" data access. Table data access is already buffered. And 30M rows/year is a *small* amount of data. – Panagiotis Kanavos Apr 10 '19 at 08:06
  • If you want to create a fact table or any other kind of reporting table, consider using columnstore indexes on physical tables. You get the compression, in-memory processing advantages, 100x perf *and* fast insertion through the deltastore. The deltastore limit is 1M rows though. With so little data you'd have to wait for months before the server automatically moved any rows there to the columnstore. You'll have to schedule an `ALTER INDEX ... REBUILD` eg every night to avoid this – Panagiotis Kanavos Apr 10 '19 at 08:13
  • Hi and thanks. This isn't a fact table. This is table is being used to update the flag key. Within that flag reference table is a large number of flags that support different views of data. By updating the flag key to the relevant new key, I don't need to hold all the flags for every row just one column which I can make a join to in the views the customers see and present them with what they ask for. – user1663003 Apr 10 '19 at 08:15
  • It's still a small table with infrequent updates. 30M rows/year is just 2 rows per second. Nothing in the comment justifies the use of an in-memory table. You should probably explain what the *actual* problem is, not how you tried to solve it. Even talking about a flag table is an attempt to solve the actual problem, not the problem itself. SQL Server creates [bitmap filters](https://sqlserverfast.com/blog/hugo/2018/05/plansplaining-part-5-bitmaps/) to optimize star joins, so a "flag table" may actually *harm* performance instead of improving it. It definitely makes reporting more difficult. – Panagiotis Kanavos Apr 10 '19 at 08:22
  • As for fact tables, the question mentions rows that become immutable, financial periods. The comment talks about multiple views. That's the problem that proper reporting schemas, like a star schema, solve. Instead of trying to query the live data each time, copy the data into a specialized reporting schema that makes reporting *easier*. Again, 30M rows is little data. You shouldn't have perf issues with physical tables unless the queries and schema aren't suitable for the job. The solution is *NOT* to use an in-memory table, it won't make reporting queries go faster. – Panagiotis Kanavos Apr 10 '19 at 08:28
  • OK thanks and it probably is as you say. The goal here though is not to improve reporting. The goal is to speed up the business logic processing that updates the flag key to its new value based on criteria provided and to do so were looking at a native complied SP. However, it would still be nice to know do you go for the large bucket even though it will be empty or is there criteria for it I'm unaware of (apart from twice the expected number rows). – user1663003 Apr 10 '19 at 08:51
  • Which I've already explained on each comment isn't going to happen. Proper schemas will speed up business logic processing. Typically known as *reporting*. Business logic doesn't have flags, programmers implementing the business logic use flags. There are plenty of articles that explain how to control bucket sizes too. They don't help you though because what you want doesn't fit an in-memory table's purpose – Panagiotis Kanavos Apr 10 '19 at 08:52
  • Like a Dictionary<> or HashSet<>, data is hashed and stored in *buckets* that contain all items with the same hash value. Those buckets typically store items in linked lists which are fast for insertsions, slow for reading. That's why a large number of buckets is desirable. Too many though are just wasteful because they'll be mostly empty. The best bucket count/size will depend on the expected data size and values. If that changes, the original sizes will become a problem. That's not an issue typically because in-memory tables are **not** meant for long-term storage – Panagiotis Kanavos Apr 10 '19 at 08:58
  • And that's why you can't find anything for your case - it's the bad case scenario that just *shouldn't* be used with in-mem tables. You don't have a lot of traffic, most of the data is immutable. The sizes will keep changing, with the most significant changes in the first few months - 100% in May, 33% in June. You won't gain anything for that complexity though. Your queries *won't* run faster just because you used an in-mem table. They *may* run faster due to the columnstore index but even that is hampered by the narrowness of the table. – Panagiotis Kanavos Apr 10 '19 at 09:00
  • Thanks, but I feel its moved a bit off topic. This wasn't about querying the data. As mentioned it is about updating the flag key in the table. Currently an SP to update that which contains all the business logic to run takes a while to complete. The idea was to move the ctrl table to in-mem and change the SP to a natively compiled SP with the business logic to improve the update speed. The plan was to insert new records each day and as they reached the immutable stage where the flag key can't change, write them to an on disk table and delete from the in-mem. If I poorly explained I apologise. – user1663003 Apr 10 '19 at 10:59
  • I understood what you ask from the start. I'm saying that you're using the wrong tool and no tweaking is going to make it the right tool. You're asking why hammering a Torx screw is tearing up your plank. `Currently an SP to update that which contains all the business logic to run takes a while to complete.` why? What does it do? Are there any missing indexes? Data access is *already* buffered, an in-memory table won't make it go faster. The sizes you posted show only a few updates per minute, why is it slow? – Panagiotis Kanavos Apr 10 '19 at 11:03
  • As for setting the bucket sizes, I explained what these do too, and why you won't find an answer - your data changes a *lot* so the correct size also changes a lot. The bucket count depends on the number of unique keys. You can't get *one* good number when the data goes from 100K to 30M. You'll have to change the bucket count as time goes by. That's [possible in SQL Server 2016](https://www.mssqltips.com/sqlservertip/4045/in-memory-oltp-changes-in-sql-server-2016/) with `Alter Index ... rebuild with (Bucket_count=...)` – Panagiotis Kanavos Apr 10 '19 at 11:08

0 Answers0