4

I need a worker role that takes a txt file from a Blob and bulk it into an Azure Sql table.

I'm using the SqlBulkCopy provided by LumenWorks.Framework.IO, I've created 2 version of the worker role: 1) read the entire file, load it into a DataTable, execute the SqlBulkCopy 2) StreamRead the file and pass the Stream to the SqlBulkCopy

The problem is that the second version has like half of the performance of the first one.

As an example with a 10MB txt file, with 90'000 records: -first version: half a second to load file, 2 seconds to convert to a DataTable, 20 seconds for the SqlBulkCopy

-second version: 50 seconds total (more than double!)

I've tried to change the BatchSize but it doesn't seem to make much difference and I don't know what im doing wrong, here is the 2' version code:

using (var stream = await blockBlob.OpenReadAsync(cancellationToken))
using (var reader = new StreamReader(stream))
using (var csv = new CsvReader(reader, true, ';'))
using (var conn = new SqlConnection(CloudConfigurationManager.GetSetting("TestDbConn")))
{
      await conn.OpenAsync(cancellationToken);
      connAperta = true;
      using (var transaction = conn.BeginTransaction())
      using (var bulkCopy = new SqlBulkCopy(conn, SqlBulkCopyOptions.KeepIdentity | SqlBulkCopyOptions.TableLock, transaction))
      {
             bulkCopy.BulkCopyTimeout = 300;
             bulkCopy.DestinationTableName = "[3sc_base_1]";
             await bulkCopy.WriteToServerAsync(csv, cancellationToken);
             transaction.Commit();
      }
}

What i'm doing wrong??

Simone
  • 332
  • 4
  • 16
  • What exactly is the question here? By SqlBulkCopy you have the entire file in local memory, then send it to the server. By streaming, you wait first for the HTTP REST stream to come to your worker, then redirect it to the Azure SQL Database. And when you found the faster, just use it. – astaykov Jan 25 '15 at 16:50
  • The problem of course is that I want to use the stream version, otherwise I could have problem with bigger files if I loaded them in memory at once. The 2 methods do the same thing, they read a file and then execute a query on the database, that means that logically speaking if I execute those 2 operations as indipendent actions one after the other they should not be faster than using a single action by using streams (not that faster at least). So my results makes no sense, of course I made a mistake, but what is that mistake? I think that it has to do with SqlBulkCopy options – Simone Jan 26 '15 at 09:25
  • so in both cases you are reading file from Blob? just in case 1 you download the file first, then load in memory, then push to SqlBulkCopy. While is second case you want to stream from Blob stream directly into SQLBulkCopy? And in both cases you measure the time starting with BLob operation first (including blob download time in your case 1) ? – astaykov Jan 26 '15 at 11:32
  • exactly, in both cases I took the entire process time into consideration, I understand that SqlBulkCopy could perform better if it has the entire file in memory but i think that by specifying BatchSize it shouldn't make any difference – Simone Jan 26 '15 at 15:24
  • Try tweaking `bulkCopy.BatchSize = 1000;` – minghan Jan 18 '16 at 20:30

1 Answers1

0

Have a look at the new Azure SQL Database capability to bulk upload directly from an Azure Storage account.

This should be the fastest and easiest way to achieve what you want unless you are not streaming directly but doing transformation as well.

aboersch
  • 128
  • 10