Questions tagged [data-integration]

Data integration is the combination of technical and business processes used to combine data from disparate sources into meaningful and valuable information. A complete data integration solution encompasses discovery, cleansing, monitoring, transforming and delivery of data from a variety of sources.

Data integration is the combination of technical and business processes used to combine data from disparate sources into meaningful and valuable information. A complete data integration solution encompasses discovery, cleansing, monitoring, transforming and delivery of data from a variety of sources.

It is a huge topic for the IT because, ultimately aims to make all systems work seamlessly together.

Example with data warehouse

The process must take place among the organization's primary transaction systems before data arrives at the data warehouse.
It is rarely complete, unless the organization has a comprehensive and centralized master data management(MDM) system.

Data integration usually takes the form of conforming dimensions and facts in the data warehouse. This means establishing common dimensional attributes across separated databases. Conforming facts is making agreement on common business metrics such as key performance indicators (KPIs) across separated databases, so these numbers can be compared mathematically.

332 questions
0
votes
0 answers

How to get 0001 instead of 2001 using YYYY

I have a column with different timestamps, like: 5771.10.04 16:07:23.800913000 0967.06.17 06:20:28.800906000 3857.06.18 03:49:03.800906000 01.04.29 16:45:04.400909000 I need to convert these into decimals (which I use for a join of some million…
Cos
  • 1,649
  • 1
  • 27
  • 50
0
votes
1 answer

Blob fields in SAS gets truncated

I have been working on a SAS job that extracts a table from SQL server and then loads that table to an Oracle table. One of the fields there in SQL server is blob and they can be as big as 1G. I am getting length warnings when I run this blobs on…
PoX
  • 1,229
  • 19
  • 32
0
votes
1 answer

Pentaho: How to read SQL result row by row to execute one by one?

i built a simple transform for select rows data from table in the database using Table Input. i know that Table Input return all result one time. but what i need to do is to get result row by row and continue the process, then return back to the…
Jason4Ever
  • 1,439
  • 4
  • 23
  • 43
0
votes
3 answers

Good conventions for embedding schema of a flat file

We receive lots of data as flat files: delimitted or just fixed length records. It's sometimes hard to find out what the files actually contain. Are there any well established practices for embedding the schema of the file to the beginning or the…
Ville Koskinen
  • 1,266
  • 2
  • 15
  • 20
0
votes
1 answer

Copy Files Step in Pentaho

I have a job that uses the copy files step which copies files from an authenticated server to another windows server. When I run the job from my local it seems to run fine, but when I place the job into the server and run it, there is an error…
0
votes
3 answers

Need each daily report only show documents that were scanned after the last report run

I want each daily report only show documents that were scanned after the last report was run. I don't want the report to be a running total of all previous documents. PROC SQL; CREATE TABLE ARG_REPORT AS SELECT VID, LAST_NAME, …
user3489870
  • 95
  • 1
  • 2
  • 8
0
votes
1 answer

Oracle data migration with lot of schema changes

I need to do an Oracle data migration from 11g to 12c where schema changes are abundant. I have an excel sheet which describes all the schema changes. Excel sheet has the columns for 'old_table_name', 'old_column_name', 'old_value' and same for the…
asthiwanka
  • 437
  • 2
  • 6
  • 17
0
votes
2 answers

How to Design Pentaho Kettle (data integration) Job/Transformation Which is Run On Server?

I am new to pentaho kettle (data integration). The version we use here is the community edition version 5.0. The case is I would like to design a job & transformations which requires files (big ones) which are located within a remote server. This…
haper
  • 99
  • 1
  • 9
0
votes
1 answer

pentaho spoon/kettle merge row diff step

I want to update a new database's table based on an old one this is the data in the old table: id,type 1,bla 2,bla bla the new table is empty. Currently i have the two input table steps connected to a merge rows diff step and then funnel that into a…
Killerpixler
  • 4,200
  • 11
  • 42
  • 82
0
votes
0 answers

PDI a.k.a Kettle Client-Server Setup

I am trying to setup the Pentaho Data Integration (Kettle) into Client-server and i have been following the steps from here http://wiki.pentaho.com/display/EAI/A+guide+to+setting+up+PDI+in+a+Microsoft+client-server+style+environment The Server side…
Ritesh
  • 237
  • 1
  • 4
  • 13
0
votes
2 answers

PDI Kettle/Spoon Table to foreign key matching

I have a sources table that has ID and Source(varchar) 1 Facebook 2 Twitter 3 Google I have incoming data that has Source(varchar) and Views(Int) Facebook 10 Twitter 12 Reddit 14 I want the kettle job to do this: Check if the source exists in the…
Killerpixler
  • 4,200
  • 11
  • 42
  • 82
0
votes
1 answer

SAS getting table primary key

I'm completely new in SAS 4GL... Is it possible to extract from table, which columns are primary key or parts of compound primary key? I need their values to be merged into one column of an output dataset. The problem is, that as an input I can get…
Charles Yaken
  • 21
  • 2
  • 5
0
votes
1 answer

Replace target table from physical to a work table on Data Integration Studio

I know how to replace a work table with a physical one without losing mapping, logic etc... Question is , is it possible doing the opposite? I want to replace a target table at the end of my job, so that it would be located in work envirnment.
user2518751
  • 685
  • 1
  • 10
  • 20
0
votes
1 answer

Preprocessing and ingesting data in Hadoop

We have two types of logs: 1) SESSION LOG: SESSION_ID, USER_ID, START_DATE_TIME, END_DATE_TIME 2) EVENT LOG: SESSION_ID, DATE_TIME, X, Y, Z We only need to store the event log, but would like to replace the SESSION_ID with its corresponding USER_ID.…
0
votes
1 answer

Pentaho BI Server - Charting live data

I have a URL that produces JSON, { "status": "success", "totalRecords": 55, "records": [ { "timestamp": 1393418044341, "load": 40, "deviceId": 285 }, { "timestamp": 1393418104337, "load": 42, …
Dan
  • 2,020
  • 5
  • 32
  • 55