1

I have a source flat file with about 20 columns of data an roughly 11K records. Each record (row) contains info such as

patientID,PatietnSSN.PatientDOB,PatientSex,PatientName,Patientaddress,PatientPhone,PatientWorkPhone,PatientProvider,PatientReferrer,PatientPrimaryInsurance,PatientInsurancePolicyID.

My goal is to move this data to a sql database.

I have created a database with the below datamodel

I know want to do a bulk insert to move all the records however I am unsure how to do that as you can see there are and have to be constraints in order to ensure referential integrity. What should my approach be? am I going about this all wrong? thus far I have used SSIS to import the data into a single staging table and now I must figure out how to write the 11k plus records to the individual tables in which they belong... so record 1 of the staging table will create 1 record across almost all of the tables minus perhaps the ones where there are 1 to many relationships like "provider" and "Referrer" as one provider will be linked to many patients but one patient can only have one provider.

I hope I have explained this well enough. Please help!

Ilyes
  • 14,640
  • 4
  • 29
  • 55
  • Consider looking into the stored procedure / function support in SQLServer. You might be able to compose a few procedures that can e.g. insert a main record, return the id, then insert related records with references to that returned ID. – Ed Orsi Nov 10 '17 at 22:34
  • Unless you really need it, I'd remove the appt/providers reference - the circular dependency will cause you extra work. Also, what tool will you use to update the DB? Raw SQL or a programming language? If modeled correctly, many ORMs will take care of this for you. Were this me being asked to do it, I'd script it up in Python, or use an ETL tool such as expressor. – SteveJ Nov 10 '17 at 22:36
  • @ed Orsi wouldn't this violate constrains much like the SSIS package tends to since it wont do inserts in any particular order so i have no way of controlling what records get written first second etc...? – Jose Freyre Nov 10 '17 at 23:41
  • @SteveJ i am looking into your suggestion of ORMs as we speak... im all about tools that do work for me! :-) – Jose Freyre Nov 10 '17 at 23:44
  • @JoseFreyre; If you don't have a favorite language, Python with Pony-Orm I think might do it. C# with Entity Framework would as well -- but with a big learning curve. – SteveJ Nov 11 '17 at 00:17
  • @SteveJ's suggestion how about Microsoft's entity framework tools? im more comfortable with MS tools.I'm not a developer and i'm just starting out with database modeling so please forgive my silly questions. In terms of how the data is being updated :the data lives in a terminal based application with its on "UI" our only way of getting at the data to perform any sort of analysis or tool development is a flat file export and subsequent SQL import to at the data. We are concentrating on office 365 solutions (Powerapps etc) but for now we will just have a simple workflow in SharePoint in mind – Jose Freyre Nov 11 '17 at 00:18
  • honestly my primary objective is just to get the data in the database first then figure out what we will do with it beyond what our current plans are which is to integrate with powerapps and sharepoint to create simple workflo that creates a paper document at the end.nothing fancy. I suppose one could say that the application model we are writing for is "Sharepoint"? again please excuse my lack of knowledge. – Jose Freyre Nov 11 '17 at 00:48
  • 1
    @JoseFreyre The idea is you might write a function to insert one record (handling all the descendant join tables) and potentially either call that function multiple times OR write a higher order function that takes the list of records you wish to insert and calls the lower function for each item. – Ed Orsi Nov 13 '17 at 17:21

2 Answers2

1

As the question is generic, I'll approach the answer in a generic way as well - in an attempt to at least get you asking the right questions.

Your goal is to get flat-file data into a relational database. This is a very common operation and is at least a subset of the ETL process. So you might want to start your search by reading more on ETL.

Your fundamental problem, as I see it, is two-fold. First, you have a large amount of data to insert. Second, you are inserting into a relational database.

Starting with the second problem first; Not all of your data can be inserted each time. For example, you have a provider table that holds a 1:many relationship with a patient. That means that you will have to ask the question of each patient row in your flat table as to whether the provider exists or needs creating. Also, you have seeded Ids, meaning that in some instance you have to maintain your order of creation so that you can reference the id of a created entry in the next created entry. What this means to you is that your effort will be more complex than a simple set of SQL inserts. You need to logic associated with the effort. There are several ways to approach this.

  • Pure SQL/TSQL; It can be accomplished but would be a lot of work and hard to debug/troubleshoot
  • Write a program: This gives you a lot of flexibility, but means you will have to know how to program and use programming tools for working with a database (such as an ORM)
  • Use an automated ETL tool
  • Use SQL Server's flat-file import abilities
  • Use an IDE with import capabilities - such as Toad, Datagrip, DBeaver, etc.

Each of these approaches will take some research and learning on your part -- this forum cannot teach you how to use them. And the decision as to which one you want to use will somewhat depend on how automated the process should be.

Concerning your first issue -- large data inserts. SQL has the facility for Bulk inserts docs, but you will have to condition your data first.

Personally (as per my comments), I am a .Net developer. But given this task, I would still script it up in Python. The learning curve is very kind in Python and it has lots of great tools for working with files and database. .Net and EF carry with it a lot of overhead with respect to what you need to know to get started that python doesn't -- but that is just me.

Hope this helps get you started.

Guido Leenders
  • 4,232
  • 1
  • 23
  • 43
SteveJ
  • 3,034
  • 2
  • 27
  • 47
1

Steve you are a boss, thank you. Ed thanks to you as well!

I have taken everyone's guidance into consideration and concluded that I will not be able to get away with a simple solution for this.

There are bigger implications so it makes sense to accomplish this ground work task in such a way that allows me to exploit my efforts for future projects. I will be proceeding with a simple .net web app using EF to take care of the data model and write a simple import procedure to pull the data in.

I have a notion of how I will accomplish this but with the help of this board I'm sure success is to follow! Thanks all-Joey

For the record tools I plan on using (I agree with the complexity and learning curve opinions but have an affinity for MS products).

  1. Azure SQL Database (data store)
  2. Visual Studio 2017 CE (ide)
  3. C# (Lang)
  4. .net MVC (project type)
  5. EF 6 (orm)
  6. Grace (cause I'm only human :-)
Guido Leenders
  • 4,232
  • 1
  • 23
  • 43