-2

[note: needs to be in code as can't use SSIS or similar]

I need to bulk copy data from one database to another using C# and EF probably - though this isn't cast in stone.

The problem is that the source data is all in varchar(max) and I want the destination in correct data types. The source is historical from an old ETL job that works very well and I can't get replaced. The most common issue I have seen is alpha's in numeric fields - e.g. "none" in a money field. These are fine in the source since it's all varchar.

I'd like to copy the data and validate it:
source -> validate -> destination
in the simplest way possible. If validation fails then I need to know the exact row that failed (and ideally WHAT failed) so that it can be manually fixed in the source, and the data re-copied.

There are around 50 tables ranging between 10 and 1.7M rows! So speed is important as well.

What would be a sensible way to approach this? Create DTO's, validation attributes and automap? Two EF entities and map across row by row and validate each? SPROC and manual insert?

BlueChippy
  • 5,935
  • 16
  • 81
  • 131
  • What's wrong or not clear with the question? – BlueChippy Feb 27 '13 at 05:15
  • What exactly is the question here? Are you wanting somebody to just write the code for you? – Mike Perrenoud Feb 27 '13 at 05:16
  • Not at all - I'm asking what would be a sensible way to approach this? Create DTO's, validation attributes and automap? Two EF entities and map across row by row and validate each? SPROC and manual insert? – BlueChippy Feb 27 '13 at 05:18
  • Okay well that's not really all that clear in the question. Please edit your question and put that text at the bottom in bold. – Mike Perrenoud Feb 27 '13 at 05:24

1 Answers1

2

Do it in T-SQL with a linked server.

I.e.:

--begin a transaction to wrap validation and load
BEGIN TRAN

--Validate that no tickets are set to closed without a completion date
SELECT * 
FROM bigTableOnLocalServer with (TABLOCKX) -- prevent new rows
WHERE ticketState = '1' /* ticket closed */ and CompletionDate = 'open' 

--if validation fails, quit the transaction to release the lock
COMMIT TRAN

--if no rows in result set 1, execute the load
INSERT INTO RemoteServerName.RemoteServerDBName.RemoteSchema.RemoteTable (field1Int, Field2Money, field3text)
SELECT CAST(Field1 as int), 
    CASE Field2Money WHEN 'none' then null else CAST(Field2Money as money) END,
    Field3Text
FROM bigTableOnLocalServer
WHERE recordID between 1 and 1000000

-- after complete, commit the transaction to release the lock
COMMIT TRAN

If you cannot communicate directly between the servers, still do the validation in SQL, but use a C# client to write the data to disk and hit the Bulk insert function on the destination server. Since the C# component would do nothing more than transport the data, I would just go straight to a format usable by BULK INSERT, à la CSV.

Mitch
  • 21,223
  • 6
  • 63
  • 86
  • The tables are on the same server, so this should be fairly easy. How do I know which row fails? – BlueChippy Feb 27 '13 at 05:45
  • You would do the validation as a set of queries done proactively before the insert itself. I.e.: if you are looking for fields which should only be integers, you would have a condition that looks for non 0-9 values (`IntegerTextColumn LIKE '[^0-9]'`). The rows returned are the rows which failed validation. – Mitch Feb 27 '13 at 05:50