Questions tagged [etl]

ETL is an acronym for Extract, Transform, and Load. It refers to a process of extracting data from source systems, transforming the data in some way (manipulating it, filtering it, combining it with other sources), and finally loading the transformed data to target system(s).

ETL is a process prominent especially in data warehousing. The term is an industry standard, representing data movement and transformation. It performs the following functions:

Extract data from homogeneous or heterogeneous data sources

Transform data - cleansing, reformatting, standardization, aggregate, join with other data, and apply business logic

Load the data in specified target tables, systems, or flat files.

The ETL processes are reusable components that can be scheduled to perform data movement on particular time intervals. Used as batch processes, ETL jobs support massive parallel processing of data. Generally they are easy to maintain and scalable.

It is used for movement across systems when a high volume of data and complex business rules are involved.

Big Data and the ELT approach (yes, ELT)

Traditional data integration was performed using batch processing (data on the rest), while big data integration can be done in real-time or with batch processing. Which make the ETL phases reordered to become ELT in some cases, so the data is extracted, loaded into distributed file systems, and then transformed before being used.

Commonly used ETL tools are, for example:

  • SQL Server Integration Services (SSIS)
  • Informatica PowerCenter
  • IBM Infosphere DataStage
  • Syncsort DMX / DMX-h
  • Oracle Data Integrator
  • Oracle Warehouse Builder
  • SAS Data Management
  • CloverETL

Open-source solutions are available as well, such as

  • Pentaho Data Integration (Kettle)
  • Talend Open Studio.

References:

5837 questions
2
votes
1 answer

how to load special character string in SSIS

Problem I am trying to load string from flat file into SSIS but it gives me error due to it contains specials character but I want to load it and store it as it is into the destination table. How can I do thi. Example…
Malik Kashmiri
  • 5,741
  • 11
  • 44
  • 80
2
votes
2 answers

SSIS package failure when executed in windows task schedule

I have three SSIS packages to sync data from SharePoint to SQL Server, I have put them in task schedule to run daily, but one or two of them failure sometimes. The error message from task schedule is action "C:\Program Files\Microsoft SQL…
Sven
  • 79
  • 10
2
votes
3 answers

SSIS Destination Insert Error

PACKAGE DESCRIPTION: I use a Source from One database do a lookup to get a surrogate geography key then another to check if the customer exists, if not insert the row if so update the row... PROBLEM: I am unable to insert approximately 700,000…
Will
  • 228
  • 1
  • 2
  • 15
2
votes
4 answers

Unable to close excel even after releasing resources

I have tried many solution but didn't work Please help me to fix this issue. Below is the code used in SSIS Script Task using Excel = Microsoft.Office.Interop.Excel; Excel.Application xlApp = null; …
Gireesh k
  • 31
  • 5
2
votes
3 answers

Workaround for exporting data to Excel with more than 255 columns

SSMS and SSRS to Excel enable for more than 255 columns when copy-pasting. SSIS does not allow for more than 255 columns to be exported to Excel 2007. Is there a way to override this?
dirtyw0lf
  • 1,899
  • 5
  • 35
  • 63
2
votes
2 answers

Can't change values in SSIS Transformation Script Editor

Using Visual Studio 2015 Enterprise I'm trying to change a few values inside the Script Transformation Editor but they are grayed out and I can't modify them. Here I'm trying to change ScriptLanguage to Microsoft Visual Basic: Here I would like to…
Jonathan Porter
  • 1,365
  • 7
  • 34
  • 62
2
votes
0 answers

SSIS: Import data from a variable number of sources in parallel

I am sure this is a fairly common problem but i can't manage to find anything about it. We have an unspecified number of DB Servers from which to get data: we started with two DB servers but they can become more. All the metadata are the same across…
Johannes Wentu
  • 931
  • 1
  • 14
  • 28
2
votes
1 answer

Best way to sort 200-300 million records in Pentaho?

I am working on this new task where my input csv file has about 200 to 300 million records my requirement is to sort the incoming data perform lookup's get the key value and insert into target table. One suggestion was to write a java plugin that…
DUnkn0wn1
  • 401
  • 1
  • 8
  • 23
2
votes
0 answers

Refreshing Non-Production Environments (Data Warehouse)

My question has to do with regard to refreshing non-prod environments (Dev, Test, Staging). My question: What is typically best practice in terms of how you do it and frequency? A couple options that I thought of: Every night or periodically run…
James
  • 117
  • 2
  • 11
2
votes
1 answer

Convert number format in SSIS

Need some help please .. I have excel data that is formatted. Example 33.257615111 shows as 33.258 (thirty three million two hundred and fifty eight thousand) I am trying to import data file into SQL using SSIS ETL. Is there a way I can convert this…
ANASI-Newbie
  • 111
  • 8
2
votes
1 answer

Modeling many-to-many relationship in data warehouse

I have to design data warehouse model and ETL process for class at my University. My data warehouse has to store opinions / comments about a product, each record should consist of: comment text (String) product score ({0, 0.5, … , 4.5, 5}) comment…
user5900426
2
votes
2 answers

SSIS - Fast way of detecting true deletes and updating data warehouse

I'm looking for an efficient way of detecting deleted records in production and updating the data warehouse to reflect those deletes because the table is > 12M rows and contains transactional data used for accounting purposes. Originally,…
Jonathan Porter
  • 1,365
  • 7
  • 34
  • 62
2
votes
2 answers

Flattening a Recursive Hierarchy Into a Dimension with SSIS

I have a recursive hierarchy in a relational database, this reflects teams and their position within a hierarchy. I wish to flatten this hierarchy into a dimension for data warehousing, it's a SQL Server database, using SSIS to SSAS. I have a table,…
THEMike
  • 1,701
  • 2
  • 17
  • 28
2
votes
2 answers

Incorrect External column length in SSIS OLE DB Destination

I am writing a set of data to a SQL Server 2008 table in SSIS (Visual Studio 2008). As I map my data set to a OLE DB destination I get a truncation error because the OLE DB destination columns have length 20. This happens for every column but I'll…
Alessandro Cifani
  • 1,419
  • 2
  • 12
  • 18
2
votes
1 answer

Error encountered when executing SSIS package

I am encountering the following error when executing an SSIS package: Retrieving the COM class factory for component with CLSID {4555B0A5-E8D7-49BD-B299-3140C34D3769} failed due to the following error: e06d7363 Exception from HRESULT:…
Tola Odejayi
  • 3,019
  • 9
  • 31
  • 46