Questions tagged [etl]

ETL is an acronym for Extract, Transform, and Load. It refers to a process of extracting data from source systems, transforming the data in some way (manipulating it, filtering it, combining it with other sources), and finally loading the transformed data to target system(s).

ETL is a process prominent especially in data warehousing. The term is an industry standard, representing data movement and transformation. It performs the following functions:

Extract data from homogeneous or heterogeneous data sources

Transform data - cleansing, reformatting, standardization, aggregate, join with other data, and apply business logic

Load the data in specified target tables, systems, or flat files.

The ETL processes are reusable components that can be scheduled to perform data movement on particular time intervals. Used as batch processes, ETL jobs support massive parallel processing of data. Generally they are easy to maintain and scalable.

It is used for movement across systems when a high volume of data and complex business rules are involved.

Big Data and the ELT approach (yes, ELT)

Traditional data integration was performed using batch processing (data on the rest), while big data integration can be done in real-time or with batch processing. Which make the ETL phases reordered to become ELT in some cases, so the data is extracted, loaded into distributed file systems, and then transformed before being used.

Commonly used ETL tools are, for example:

  • SQL Server Integration Services (SSIS)
  • Informatica PowerCenter
  • IBM Infosphere DataStage
  • Syncsort DMX / DMX-h
  • Oracle Data Integrator
  • Oracle Warehouse Builder
  • SAS Data Management
  • CloverETL

Open-source solutions are available as well, such as

  • Pentaho Data Integration (Kettle)
  • Talend Open Studio.

References:

5837 questions
2
votes
1 answer

SSIS expression

Anybody have any idea why this SSIS expression is not working The derived column name is from_service_date_yyyymmdd The expression is (DT_STR,4,1252)from_service_date_yyyy + RIGHT("0" + (DT_STR,2,1252)from_service_date_mm,2) + RIGHT("0" +…
Dick Rosenberg
  • 113
  • 2
  • 10
2
votes
2 answers

SSIS Web Service Task returns a file file with "<", </Value> characters

I have a simple ssis package what saves result from web service method to XML file. Connection OK, file creating, but contains < and > tags instead of < and > . How I can replace this tags to correct?
Key
  • 71
  • 1
  • 9
2
votes
1 answer

What are advantages and disadvantages of raw files in SSIS and when do we use them?

Could anyone please describe the advantages and disadvantages of Raw file transformations in ssis packages and in which instances do we use them?
Lakshmi
  • 93
  • 1
  • 6
2
votes
1 answer

Is there anyway to perform outer join in SSIS without using Merge Join transformation?

I need to do many left joins to create my fact table which has more than 150 M Records. When i do outer join using merge join and sort transformation, it takes many hours to load data. So need a help to do this without merge join transformation.
2
votes
3 answers

SSIS - specify only one parameter in OLE DB Source with multiple '?'

when writing my SQL command I specify ? to indicate that I want to use a passed in Parameter. However, for every ? I use, the package thinks it's a new parameter even though I want to use the same one. 6 ? = 6 new parameters though each parameter…
user3428422
  • 4,300
  • 12
  • 55
  • 119
2
votes
2 answers

SSIS Parameters With Functions

I've done quite a bit of searching on this but haven't been able to come up with a solid solution. I have a use case where for example I want to include the date in the file name, let's say I use the following…
sc305495
  • 249
  • 3
  • 11
2
votes
2 answers

Using Dynamic Table name in ADO.NET Source Editor

I have an SSIS data flow task where I am trying to load data from a source table using ADO.NET Source Editor into a SQL Server table. The problem: The source table name is dynamic and depends on the current date. Example: If I want to load today's…
2
votes
1 answer

SSIS Flat File - CSV formatting not working for multi-line fileds

I want to import *.csv file. There is head row with the column names, and data rows below. The problem is in column description.This column has multi-line text and each line is recognised as the record. Document has {CR}{LF} for the end of row, and…
Milos Nikolic
  • 373
  • 3
  • 14
2
votes
1 answer

SSIS - Calculate Opening and Closing Balance

I need to calculate the Opening Balance and the Closing Balance in SSIS. I have the below data as input. invoice_date amount 12/4/2016 4000 12/5/2016 5000 12/6/2016 7500 12/7/2016 5000 12/8/2016 8000 I want the…
Yousuf Sultan
  • 3,055
  • 8
  • 35
  • 63
2
votes
0 answers

Can Pentaho be hosted in microsoft azure

Can we host Pentaho data integration in Microsoft azure.If yes what may be the estimated cost for this.
vkc
  • 83
  • 1
  • 1
  • 9
2
votes
1 answer

Cannot open existing package and msdb from integration system?

I have an sql server and its been installed as new instance called instance2 . Now when i tries to open MSDB folder under stored packages this gives me below error TITLE: Microsoft SQL Server Management Studio ------------------------------ Failed…
Bodhi
  • 548
  • 4
  • 14
  • 26
2
votes
1 answer

SSIS Script Task - how accept wildcard files in VB.net

I have this SSIS script task code copied from a site. I'm trying to avoid FTP failed error when the folder is empty. Does not need to a specific name file. The code below is for specific filename. How to make the filename a wildcard? Public Sub…
Jason312
  • 197
  • 1
  • 1
  • 10
2
votes
2 answers

ETL vs ELT in Amazon Redshift

We are researching on creating a Data Lake solution on AWS - similar to what's outlined here - https://aws.amazon.com/blogs/big-data/introducing-the-data-lake-solution-on-aws/ We will be storing all the "raw" data in S3 and load it into EMR or…
Raj
  • 549
  • 2
  • 11
  • 25
2
votes
2 answers

SSIS: convert ISO 8601 string to datetime column

I have a CSV falt file I'm trying to import using SSIS. The date string is in the format 20170215T000002 - this looks like ISO 8601, so I'm pretty sure I should just be able to map it to a datetime column type in SQL, but I lose all of the time…
Gavin Hill
  • 347
  • 1
  • 14
2
votes
1 answer

Which approach is better for saving the error in SSIS

I use SSIS for transferring the data and I use SQL server 2016 I need to use some lookup component in my DataFlow task and if I can't find the matches, I should save the record as information in one table. That means if for each Lookup component I…
Ardalan Shahgholi
  • 11,967
  • 21
  • 108
  • 144