2

We are migrating our MS-Access database to SQL Server Compact 4.0 using the Entity Framework 5 Code First approach. We found that using Database generated Integer ID's is very slow and to make it worse, the delay increases exponentially with the size of the database. This makes using an Identity column impossible and it seems there is a bad implementation of this feature in SQL Server Compact 4.0 paired with the Entity Framework.

So, we ran some tests and found that using a client side generated key speeds op insertion by at least 20 times, the exponential increase in insertion disappears.

Now we are looking at a the best way to generate client side ID's. Using GUID's seems the most secure option, but I read that this negatively impacts read actions. Is there a strategy in using auto-incremented Integers that are client side generated?

EDIT: I will investigate the underlying problem that lead to the question further. In the mean time can my real question be answered please? :-)

EDIT2: It is pretty exasperating that nobody seems to believe the assertion that using auto-id's with EF and SQL Server compact 4.0 is so slow. I posted a separate question about this with a proof of concept that should be easily reproducible.

Community
  • 1
  • 1
Dabblernl
  • 15,831
  • 18
  • 96
  • 148
  • 4
    I suspect there is some other problem either in your application logic or your testing methodology. Generating integer IDs using IDENTITY columns should not be slow nor should it make using identity columns impossible. – Aaron Bertrand Feb 06 '13 at 23:28
  • 1
    Yet again I agree with @AaronBertrand, it seems very unlikely that they got something this basic that wrong – Tony Hopkinson Feb 06 '13 at 23:33
  • @AaronBertrand Likewise, it seems difficult to get a test that confirms this wrong :-). See also: http://social.msdn.microsoft.com/Forums/en/adodotnetentityframework/thread/c01ad513-ab1c-4a78-94d7-52b0750d4866 – Dabblernl Feb 07 '13 at 06:07
  • This was fixed in version 4.0 of SQL Server Compact – ErikEJ Feb 08 '13 at 06:52
  • @ErikEJ No it is not. If you take some time to look for the problem on this very site you will find that questions about this topic actually abound. You participated even in some of them. – Dabblernl Feb 08 '13 at 08:24
  • What I meantvwas that in version 3,5 no engine generated keys were supported – ErikEJ Feb 08 '13 at 09:15
  • 1
    Why can't you turn the identity column off in SQL Server, copy the data with the auto incremented integer IDs and then turn the identity column on in SQL? You can tell SQL where to pick up again with the auto incrementing integer. – ryan1234 Feb 10 '13 at 16:44
  • Have you tested both 32-bit and 64-bit SQLCE runtimes, or are you focused on one bitness particularly? – Simon Mourier Feb 11 '13 at 08:17
  • @Ryan, didnt know you could do that with Squirrel Server. Thanks +1 – phil soady Feb 14 '13 at 23:31

3 Answers3

1

If you are moving large amounts of data with EF, you are doing it wrong. Use ADO.NET, and for example a BULK COPY approach instead (with SQL CE use SqlCeUpdateableRecord). You could use my SqlCeBulkCopy library to save some coding effort.

ErikEJ
  • 40,951
  • 5
  • 75
  • 115
  • Actually I use your BulkCopy approach now. It does not update the generated ID's in the Entities though. And what is "a large amount"? EF5 + SQL Server Compact need 30 seconds to write 250 records in our databasetable with an Identity column, this drops to below a second when I generate the ID on the client. That *cannot* be good. Using your bulkkcopy the time needed is neglegible. – Dabblernl Feb 07 '13 at 09:34
  • Have you tried the KeepIdentity option with the BulkCopy API? – ErikEJ Feb 07 '13 at 12:22
  • That fails: as the ID's are zero for all entities a duplicate primary key Exception is thrown. – Dabblernl Feb 07 '13 at 21:23
1

I dont think the way of identity generation is the source of performance problem.
I think if you want to get a better performance during migration process, before conversion process, you can disable Primary keys and foreign keys and other constraint on your main tables. (this could be done by scripting or manualy)
However data integrity will be your new concern and you conversion code must be strong so after conversion process, enabling the constraints could be done.
hope this helps.

Mohsen Heydari
  • 7,256
  • 4
  • 31
  • 46
  • Thanks for your input, but your assumption about the constraints is wrong. Follow the link to my other question which proves that insertion using store generated ID's is very slow when using EF5 + SQLServer Compact 4.0 – Dabblernl Feb 10 '13 at 20:14
0

Solution as I see it.

a)Try Fix Performance problem. My suggestions (dont use large numbers of entities inside context.) try as few as the business problem will allow. Dont use merge tracking etc... See EF performance tips. http://blogs.msdn.com/b/wriju/archive/2011/03/15/ado-net-entity-framework-performance-tips.aspx and http://msdn.microsoft.com/en-au/library/cc853327.aspx

b)Use a Guid. Allocate externally (its not sequential, but fast)

c)Use a customer Integer generator, that runs in memory, can allocate many keys at once and can persist the current state. This technique is used by SAP. They call it "number range". Can be very fast but not as fast as b).

btw I use GUIDs and not DB generated IDs to make partial DB copies and migrations Easy/easier. :-)

phil soady
  • 11,043
  • 5
  • 50
  • 95
  • Using non sequential guids within an indexed table can cause re-balance the index tree every time a row is written! – coding Bott Feb 14 '13 at 15:19
  • @Bernd I strongly prefer evenly distributed Guids. Especially in Large volume scenarios . By large I mean > 100 million rows. It is very important when you get > 1 billion rows. for more on why in case interested http://stackoverflow.com/questions/13149139/how-evenly-spread-are-the-first-four-bytes-of-a-guid-created-in-net . Btw if a table is partitioned for growth and 50% of the table use is Insert, then evenly spread Guids wins hands down. Having been through this exercise with DB2 experts at a bank where more than 100 million rows are to one table alone per month. – phil soady Feb 14 '13 at 23:29