Questions tagged [etl]

ETL is an acronym for Extract, Transform, and Load. It refers to a process of extracting data from source systems, transforming the data in some way (manipulating it, filtering it, combining it with other sources), and finally loading the transformed data to target system(s).

ETL is a process prominent especially in data warehousing. The term is an industry standard, representing data movement and transformation. It performs the following functions:

Extract data from homogeneous or heterogeneous data sources

Transform data - cleansing, reformatting, standardization, aggregate, join with other data, and apply business logic

Load the data in specified target tables, systems, or flat files.

The ETL processes are reusable components that can be scheduled to perform data movement on particular time intervals. Used as batch processes, ETL jobs support massive parallel processing of data. Generally they are easy to maintain and scalable.

It is used for movement across systems when a high volume of data and complex business rules are involved.

Big Data and the ELT approach (yes, ELT)

Traditional data integration was performed using batch processing (data on the rest), while big data integration can be done in real-time or with batch processing. Which make the ETL phases reordered to become ELT in some cases, so the data is extracted, loaded into distributed file systems, and then transformed before being used.

Commonly used ETL tools are, for example:

  • SQL Server Integration Services (SSIS)
  • Informatica PowerCenter
  • IBM Infosphere DataStage
  • Syncsort DMX / DMX-h
  • Oracle Data Integrator
  • Oracle Warehouse Builder
  • SAS Data Management
  • CloverETL

Open-source solutions are available as well, such as

  • Pentaho Data Integration (Kettle)
  • Talend Open Studio.

References:

5837 questions
2
votes
1 answer

OrientDB import edges only using ETL tool

I already used the OETL to insert all my Vertex to the graph. Now I have a file that outlines the edges at the following way: node_1,rel_type,node_2 11000001,relation_A,10208879 11000001,relation_A,10198662 11000001,relation_B,10159927 …
Dor Cohen
  • 16,769
  • 23
  • 93
  • 161
2
votes
1 answer

Is this a proper idea of BI workflow?

I am new to Business Intelligence. I just got hired by a company in order to complete their websolution, implementing a BI Module. After lot of reading, I think I could get an idea of what a BI Process looks like, you'll find enclose my idea of a BI…
Spredzy
  • 4,982
  • 13
  • 53
  • 69
2
votes
1 answer

Kettle - Load Balancing

Is there any way to handle load balancing in Pentaho ETL Servers? PDI should be able to handle switching of the servers once the memory increases more than the set target to a new server without having to fail or restart the job. A feature quite…
Rishu Shrivastava
  • 3,745
  • 1
  • 20
  • 41
2
votes
1 answer

How to perform a FULL OUTER JOIN in TALEND tool?

I am using TALEND Bigdata studio for ETL. I have a case where I need to do a full outer join between two datasets but in talend studio I could not find a full outer join option. How do i do it ?
Ankush Rathi
  • 622
  • 1
  • 6
  • 26
2
votes
2 answers

How to trim the new line character of column data in ctl file of SQL Loader

My table data has contains new line character it is loading from sql loader ctl file, one column called 'IPADDRESS'is loading with new line character: My ctl file : load data INFILE 'abc.txt' INTO TABLE TABLENAME APPEND FIELDS TERMINATED BY…
Chaya
  • 95
  • 3
  • 12
2
votes
2 answers

What is Snaplogic?

As per Wikipedia: SnapLogic is a commercial software company that provides Integration Platform as a Service (iPaaS) tools for connecting Cloud data sources, SaaS applications and on-premises business software applications. It is surely a…
Bilesh Ganguly
  • 3,792
  • 3
  • 36
  • 58
2
votes
0 answers

Elastic Search Query based in subdocuments filters

I'm working in a small personal project that requires to make some queries in ElasticSearch, and I wanted to ask you for some advise because I'm kind of stuck. This is the issue: So far I have a index that contains a type called users. It stores an…
andr3s2
  • 238
  • 3
  • 10
2
votes
0 answers

How to log errors in Informatica?

Due to security issues the user or Informatica has grants to create neither tables nor synonyms. So I have come up with this: The PMERR_DATA, PMERR_MSG, etc. tables are created under another user The user of Informatica is granted to select, inset,…
neshkeev
  • 6,280
  • 3
  • 26
  • 47
2
votes
1 answer

OrientDB etl loads csv but cannot see records in the class

I am using orientdb 2.1.13 I can load the csv file >> please see oetl class Names get created, but cannot see the records in the classes Thanks for your support. RayN. test.json { "config": {"log":"debug"}, "source": {"file": {"path":…
Niloy Ray
  • 21
  • 1
2
votes
1 answer

AS400 to Oracle 10g via xml with Informatica Powercenter

Is the following workflow possible with Informatica Powercenter? AS400 -> Xml(in memory) -> Oracle 10g stored procedure (pass xml as param) Specifically, I need to take a result set eg. 100 rows. Convert those rows into a single xml document as a…
Peter Goras
  • 602
  • 3
  • 14
  • 24
2
votes
1 answer

ETL Oracle to Cassandra

We had some users which their info. was stored in Oracle db. the tables include some Joins, too. Now, I want to build a new project, using Cassandra which will use the information of previews users, too. The data modeling is using Cassandra…
Elnaz
  • 2,854
  • 3
  • 29
  • 41
2
votes
1 answer

Build ETL pipeline from MongoDB to ElasticSearch

I am building a big data analytics solution on top of my web application data. What logical architecture solution of ETL I have is:- 1. Extract - Data is first ingested from MongoDB 2. Transform - Data is transformed where multiple transformations…
Puneet Jindal
  • 147
  • 2
  • 10
2
votes
2 answers

ETL from RavenDB to SQL

Are there any ETL tools available to export data from RavenDB? Or will I be forced to write c# code to do the data migration? The RavenDB is version 1 (build 888).
CamHart
  • 3,825
  • 7
  • 33
  • 69
2
votes
2 answers

Data integration ETL with java web application?

I'm newbie with Business intelligence , and I'm going to develop a java web application. I want to integrate data from different sources so then I can store them in a database . Is there an API or jars of pentaho or talend or other ETLs that I can…
pietà
  • 760
  • 1
  • 11
  • 37
2
votes
1 answer

Alternative to cursor pivot split function?

So I'm making a stored procedure with its end-goal being a dynamic ETL solution. My company deals with a lot of third-party data and often times I do not know the number of columns, data types, format of the data, etc... As such, I've put together a…
Alexus Wong
  • 347
  • 4
  • 9
1 2 3
99
100