The problem that I'm trying to solve is:
1 - In our DB we have all the tables (also tables with millions of records) with a PK id column declared as VARCHAR(36). There is also a clustered index on it This of course as I read online it's an awful thing for performance, also because the db has a lot of reads, inserts, updates and deletes.
2 - We use Hibernate for our java web application as an ORM to this db
After extensive reading online I started changing the data type of those columns to UNIQUEIDENTIFIER with the default option newsequentialid() as this option should mitigate the fragmentation issues of our indexes.
What I noted is that the fragmentation issue persisted, the tables became heavily fragmented short after rebuilds (we do a full index rebuild every night).
Then I saw that all of our Hibernate mappings for the id columns consisted of this:
<id name="id" column="id" type="string">
<generator class="guid"/>
</id>
When an insert occurred in our system the log showed that the insert was done after calling select newid()
, so since this returns a random guid the insert would be placed in a random point in the index, thus causing fragmentation (this totally defeat the column data type change I did also).
So after another online search I tried to implement a guid generator in Hibernate by myself, implementing the interface IdentifierGenerator
and using a time based generator with JUG (http://wiki.fasterxml.com/JugHome).
The code that generates the (I thought sequential) id is this:
String uuid = null;
EthernetAddress nic = EthernetAddress.fromInterface();
TimeBasedGenerator uuidGenerator = Generators.timeBasedGenerator(nic);
uuid = uuidGenerator.generate().toString();
And I changed accordingly the mapping to this:
<id name="id" column="id" type="string">
<generator class="my_package.hibernate.CustomSequentialGuidGenerator">
</generator>
</id>
Then I tried to generate some test uuids to test their sequentiality (sequential in the uniqueidentifier way, so binary), this is a short list (every element is generated before the successive) :
314a9a1b-6295-11e5-8d2c-2c27d7e1614f
3d867801-6295-11e5-ae09-2c27d7e1614f
4434ac7d-6295-11e5-9ed1-2c27d7e1614f
491462c4-6295-11e5-af81-2c27d7e1614f
5389ff4c-6295-11e5-84cf-2c27d7e1614f
57098959-6295-11e5-b203-2c27d7e1614f
5b62d144-6295-11e5-9883-2c27d7e1614f
This look to me as alphabetically sequential, but not binary sequential.
The above test was done executing seven times the test application, it wasn't a loop.
I tried to insert those values in a column declared as unique identifier and after issuing a select on this column this is the list sql server outputs:
5389FF4C-6295-11E5-84CF-2C27D7E1614F
314A9A1B-6295-11E5-8D2C-2C27D7E1614F
5B62D144-6295-11E5-9883-2C27D7E1614F
4434AC7D-6295-11E5-9ED1-2C27D7E1614F
3D867801-6295-11E5-AE09-2C27D7E1614F
491462C4-6295-11E5-AF81-2C27D7E1614F
57098959-6295-11E5-B203-2C27D7E1614F
So I really don't understand what I should do and if I can use JUG as a sequential guid generator to avoid my fragmentation issues.
This is another JUG test, I tried 3 runs each time generating 10 guids with a loop:
Run 1
54bd156e-62a2-11e5-a1a7-2c27d7e1614f
54c3cc2f-62a2-11e5-a1a7-2c27d7e1614f
54caf820-62a2-11e5-a1a7-2c27d7e1614f
54d1aee1-62a2-11e5-a1a7-2c27d7e1614f
54d901e2-62a2-11e5-a1a7-2c27d7e1614f
54df9193-62a2-11e5-a1a7-2c27d7e1614f
54e64854-62a2-11e5-a1a7-2c27d7e1614f
54ecff15-62a2-11e5-a1a7-2c27d7e1614f
54f3b5d6-62a2-11e5-a1a7-2c27d7e1614f
54fa4587-62a2-11e5-a1a7-2c27d7e1614f
Run 2
87c66bcc-62a2-11e5-8e7c-2c27d7e1614f
87ccd46d-62a2-11e5-8e7c-2c27d7e1614f
87d3641e-62a2-11e5-8e7c-2c27d7e1614f
87d97e9f-62a2-11e5-8e7c-2c27d7e1614f
87e05c70-62a2-11e5-8e7c-2c27d7e1614f
87e6ec21-62a2-11e5-8e7c-2c27d7e1614f
87ed7bd2-62a2-11e5-8e7c-2c27d7e1614f
87f40b83-62a2-11e5-8e7c-2c27d7e1614f
87fac244-62a2-11e5-8e7c-2c27d7e1614f
880103d5-62a2-11e5-8e7c-2c27d7e1614f
Run 3
a4b690db-62a2-11e5-b667-2c27d7e1614f
a4bcd26c-62a2-11e5-b667-2c27d7e1614f
a4c2eced-62a2-11e5-b667-2c27d7e1614f
a4c92e7e-62a2-11e5-b667-2c27d7e1614f
a4cf48ff-62a2-11e5-b667-2c27d7e1614f
a4d5d8b0-62a2-11e5-b667-2c27d7e1614f
a4dc6861-62a2-11e5-b667-2c27d7e1614f
a4e34632-62a2-11e5-b667-2c27d7e1614f
a4e9d5e3-62a2-11e5-b667-2c27d7e1614f
a4f101d4-62a2-11e5-b667-2c27d7e1614f
Run 4
c2b872b2-62a2-11e5-b855-2c27d7e1614f
c2c17363-62a2-11e5-b855-2c27d7e1614f
c2c82a24-62a2-11e5-b855-2c27d7e1614f
c2ce92c5-62a2-11e5-b855-2c27d7e1614f
c2d57096-62a2-11e5-b855-2c27d7e1614f
c2dc2757-62a2-11e5-b855-2c27d7e1614f
c2e32c38-62a2-11e5-b855-2c27d7e1614f
c2e9bbe9-62a2-11e5-b855-2c27d7e1614f
c2f099ba-62a2-11e5-b855-2c27d7e1614f
c2f7507b-62a2-11e5-b855-2c27d7e1614f
Run 5
f0263d1b-62a2-11e5-8529-2c27d7e1614f
f02d1aec-62a2-11e5-8529-2c27d7e1614f
f033d1ad-62a2-11e5-8529-2c27d7e1614f
f03a615e-62a2-11e5-8529-2c27d7e1614f
f041181f-62a2-11e5-8529-2c27d7e1614f
f047a7d0-62a2-11e5-8529-2c27d7e1614f
f04dc251-62a2-11e5-8529-2c27d7e1614f
f05403e2-62a2-11e5-8529-2c27d7e1614f
f05a6c83-62a2-11e5-8529-2c27d7e1614f
f0608704-62a2-11e5-8529-2c27d7e1614f
Run 6 (Started from 0 again)
00fd4ec3-62a3-11e5-8ab8-2c27d7e1614f
01042c94-62a3-11e5-8ab8-2c27d7e1614f
010b3175-62a3-11e5-8ab8-2c27d7e1614f
0111e836-62a3-11e5-8ab8-2c27d7e1614f
0118ed17-62a3-11e5-8ab8-2c27d7e1614f
011fcae8-62a3-11e5-8ab8-2c27d7e1614f
0126a8b9-62a3-11e5-8ab8-2c27d7e1614f
012d115a-62a3-11e5-8ab8-2c27d7e1614f
0133c81b-62a3-11e5-8ab8-2c27d7e1614f
013a30bc-62a3-11e5-8ab8-2c27d7e1614f
The single groups are alphabetically (but not binary) ordered and taken the different runs as a whole they aren't event alphabetically ordered (sigh).
What am I missing?
************************* EDIT - Description of my implementation ******************
After the various comments and answers I implemented the following strategy:
I generated my own sequential (based on current timestamp) guids and this is the generator class:
package it.hibernate;
import java.io.Serializable;
import java.text.SimpleDateFormat;
import java.util.Date;
import org.apache.commons.lang.RandomStringUtils;
import org.hibernate.HibernateException;
import org.hibernate.engine.SessionImplementor;
import org.hibernate.id.IdentifierGenerator;
public class CustomSequentialGuidGenerator implements IdentifierGenerator{
@Override
public Serializable generate(SessionImplementor session, Object object)
throws HibernateException
{
String uuid = null;
try {
Date data = new Date();
SimpleDateFormat sdf = new SimpleDateFormat();
String rand = RandomStringUtils.randomAlphanumeric(12);
sdf.applyPattern("yyyy");
String year = sdf.format(data);
sdf.applyPattern("MM");
String month = sdf.format(data);
sdf.applyPattern("dd");
String day = sdf.format(data);
sdf.applyPattern("HH");
String hour = sdf.format(data);
sdf.applyPattern("mm");
String mins = sdf.format(data);
sdf.applyPattern("ss");
String secs = sdf.format(data);
sdf.applyPattern("SSS");
String millis = sdf.format(data);
//G carachter is used to insert the rows after
uuid = "GG" + year + month + "-" + day + hour + "-" + mins + secs + "-" + "0" + millis + "-" + rand;
}
catch (Exception exception)
{
exception.printStackTrace();
}
return uuid;
}
}
You can note that all rows start with the string 'GG'
because I had to make sure that all new rows would be inserted AFTER all the old rows generated via select newid()
. After that there's the current timestamp and 12 random chars to avoid collision in case of a multiple row insert in the same millisecond.
After a test of 2000 insert the primary key index fragmentation dropped from 17,92% to 0,15%.
N.B. The data type I reintroduced is obviously a varchar(36) again and not a uniqueidentifier, so the rows are sorted alphabetically.