Questions tagged [data-integration]

Data integration is the combination of technical and business processes used to combine data from disparate sources into meaningful and valuable information. A complete data integration solution encompasses discovery, cleansing, monitoring, transforming and delivery of data from a variety of sources.

Data integration is the combination of technical and business processes used to combine data from disparate sources into meaningful and valuable information. A complete data integration solution encompasses discovery, cleansing, monitoring, transforming and delivery of data from a variety of sources.

It is a huge topic for the IT because, ultimately aims to make all systems work seamlessly together.

Example with data warehouse

The process must take place among the organization's primary transaction systems before data arrives at the data warehouse.
It is rarely complete, unless the organization has a comprehensive and centralized master data management(MDM) system.

Data integration usually takes the form of conforming dimensions and facts in the data warehouse. This means establishing common dimensional attributes across separated databases. Conforming facts is making agreement on common business metrics such as key performance indicators (KPIs) across separated databases, so these numbers can be compared mathematically.

332 questions
0
votes
3 answers

Pentaho Kettle Naming Reason

why the pentaho kettle is being called as "Pentaho Data Integration" tool? can we able to use multiple data sources in a single transformation?
lourdh
  • 449
  • 2
  • 12
  • 30
0
votes
1 answer

SAS DI Studio create index on Oracle table

Is it possible to create an index on a column of an Oracle table registered in metadata of SAS Data Integration Studio? If it is, which type of index is created? Simple one or which one?
zuluk
  • 1,557
  • 8
  • 29
  • 49
0
votes
1 answer

How data flows from data source to GoodData platorm during local execution

I would like to clarify how data exactly flows from on-premis or internet data source to GoodData platform during local execution of the graphs. Case with local data source I think is obvious, but data source outside LAN is not so when executed…
Yuya Kobayashi
  • 465
  • 1
  • 4
  • 15
0
votes
1 answer

Producing timestamp in correct format using Pentaho DI

I am using Data Integration to get data from our online API. Apart of the data is a timestamp and this is printed like so on the website 1389227435641 but when it is printed up on a table it is printed like so 1.389227435641E12 How do I get it to…
Dan
  • 2,020
  • 5
  • 32
  • 55
0
votes
1 answer

Data retrieval and search accross multiple services

I'm building a system that comprises a multiple heterogeneous services that talk to each other over a network, although in the standard deployment model they are all on the same machine. The UI client for managing the entities within that complex…
0
votes
1 answer

Kettle Spoon - variable in file name input

anyone know how to set variable for file name in 'Text File Input'? I want the file name depends on when I execute the transformation, example: D:\input_file_.txt today = D:\input_file_20131128.txt tomorrow =…
BimoS
  • 129
  • 4
  • 14
0
votes
1 answer

Approach to combine two data sources with different data about same entities

Consider a scenario where I have the data about same entity from two different sources. As an example, the camera Nikon D3200, Nikon mentions the dimensions as 5.0 in. (125 mm) x 3.8 in. (96 mm) x 3.1 in. (76.5 mm) where as on amazon website its 3.1…
trailblazer
  • 1,421
  • 5
  • 20
  • 43
0
votes
1 answer

How can I integrate data on regular bases between 2 different MySQL Servers?

I currently have 2 MySQL Serve running on a different machines. Once of them is a staging environment (A) and another is a production environment (B). What I need to do is to take data from (A) and update/insert into B based on the conditions. If…
Jaylen
  • 39,043
  • 40
  • 128
  • 221
0
votes
1 answer

How to configure Apache Flume to fetch data from Twitter for specific period?

I have a hadoop cluster and apache flume for data integration from twitter to HDFS, it by default fetches data by chronological order like most recent tweet will be fetched first and likewise, and now I have usecase to fetch specific data from…
Amol Fasale
  • 942
  • 1
  • 10
  • 32
0
votes
1 answer

PDI Error occured while trying to connect to the database

I got the following error while executing a PDI job. I do have mysql driver in place (libext/JDBC). Can some one say, what would be the reason of failure? Despite the error while connecting to DB, my DB is up and I can access it by command…
Surya
  • 3,408
  • 5
  • 27
  • 35
0
votes
1 answer

Pentaho Data Integration: Error connecting to database: using class org.gjt.mm.mysql.Driver

I get this error but I have my mysql-connector-java-5.1.23-bin.jar inside Pentaho\data-integration\libext\JDBC. It seems the connector is not loaded because it's using the default one. I have tried different versions of jdbc, I've checked MD5, and I…
0
votes
1 answer

How to copy a file from network drive using pentaho

I have Accessed the FTP path by giving the credentials like below In the case of Common folders i accessed like this. This works fine when my windows pc stores the network login passwords. I need to mention in the step itself, for that i which…
0
votes
1 answer

How do you solve your "Pre-Etl" Source to target mapping problems?

Using spreadsheets is definitively non-authoritative: source mappings change as you design and test your ETL jobs. A spreadsheet that once functioned as the single or authoritative catalog of all source mappings might not get updated -- or (just as…
0
votes
1 answer

Excel output in pentaho showing last month

I´m working with PDI 4.1. I´ve created transformations and jobs, and I have an excel file with data from database. The columns in my excel file are name, date and hour, and I need to bring the data from last month. Can I do something like…
suely
  • 334
  • 1
  • 8
  • 19
0
votes
2 answers

Is there a way to read a Hibernate Session as RDF triples?

I need to query my local Hibernate managed datastore for persisted objects based on criteria where the relevant data for the WHERE clause is in the Linked Open Data cloud. Is there a way to read a Hibernate Session as RDF? If so, I can at least use…
Simon Gibbs
  • 4,737
  • 6
  • 50
  • 80