Questions tagged [data-integration]

Data integration is the combination of technical and business processes used to combine data from disparate sources into meaningful and valuable information. A complete data integration solution encompasses discovery, cleansing, monitoring, transforming and delivery of data from a variety of sources.

Data integration is the combination of technical and business processes used to combine data from disparate sources into meaningful and valuable information. A complete data integration solution encompasses discovery, cleansing, monitoring, transforming and delivery of data from a variety of sources.

It is a huge topic for the IT because, ultimately aims to make all systems work seamlessly together.

Example with data warehouse

The process must take place among the organization's primary transaction systems before data arrives at the data warehouse.
It is rarely complete, unless the organization has a comprehensive and centralized master data management(MDM) system.

Data integration usually takes the form of conforming dimensions and facts in the data warehouse. This means establishing common dimensional attributes across separated databases. Conforming facts is making agreement on common business metrics such as key performance indicators (KPIs) across separated databases, so these numbers can be compared mathematically.

332 questions
4
votes
1 answer

Changing a field from String to Int in Pentaho Data Integration

I am taking JSON information from our online API and producing it in a table with DI. I have 4 fields url, deviceId, displacement & timestamp. These are all recorded as Strings but I want them to be Int values, bar the url. In the Generate Rows…
Dan
  • 2,020
  • 5
  • 32
  • 55
4
votes
1 answer

pentaho data integration: Executing PDI the BI Server or with carte?

I'm working on a project with pentaho. I'm gonna use the whole community edition solution. I'll have jobs and transformations that will be launch by user for some (so with PDI on their computer), and automatically for others. I'm wondering if I can…
joris
  • 435
  • 1
  • 7
  • 18
4
votes
2 answers

Kitchen getting killed

I am using pentaho data integration for ETL. I am running the job in ubuntu server as a shell script. It is running for some time after that it is getting killed without throwing any error. Please help me what is the problem and tell me if I am…
Cynosure
  • 161
  • 2
  • 3
  • 9
4
votes
3 answers

Where are the transformations saved in Pentaho Data Integration

This maybe a basic question but I would like to know where are the transformations saved in Pentaho Data Integration. Currently, I am connecting to a repository and all my jobs and transformations are saved there.I would like to be able to say email…
Malavika
  • 87
  • 3
  • 14
3
votes
1 answer

SSIS designer Visual Studio foreign keys integration

I need to integrate two similar databases into third DB3. DB3 is almost the same as DB1. First database DB1: Addresses table with: primary key AddressId People table with: primary key PersonId , foreign key AddressId Second database DB2: It is…
clukz
  • 35
  • 1
  • 3
3
votes
2 answers

Pentaho Kettle conversion from String to Integer/Number error

I am new to Pentaho Kettle and I am trying to build a simple data transformation (filter, data conversion, etc). But I keep getting errors when reading my CSV data file (whether using CSV File Input or Text File Input). The error is: ... couldn't…
user2552108
  • 1,107
  • 3
  • 15
  • 30
3
votes
1 answer

Talend jobs deployment

I am new to Talend Open Studio and I'd like to develop a job on a Macbook or a Windows PC and then export the job and run it on a Linux server as a scheduled job (i.e. cron). The job will involve extracting data from 2 Oracle databases on different…
Roobie
  • 1,346
  • 4
  • 14
  • 24
3
votes
1 answer

Designing a component both producer and consumer in Kafka

I am using Kafka and Zookeeper as the main components of my data pipeline, which is processing thousands of requests each second. I am using Samza as the real time data processing tool for small transformations that I need to make on the data. My…
3
votes
1 answer

Working with large CSV files in Ruby

I want to parse two CSV files of the MaxMind GeoIP2 database, do some joining based on a column and merge the result into one output file. I used standard CSV ruby library, it is very slow. I think it tries to load all the file in memory. block_file…
Okacha Ilias
  • 53
  • 1
  • 6
3
votes
2 answers

How to store a variable from one line for use in later lines in Pentaho kettle?

I have to process a spreadsheet that has multiple levels of aggregation within it. Mostly, this is just fine, but in one case, I need to use information from the highest aggregation level in conjunction with information from the next aggregation…
Brian
  • 31
  • 1
3
votes
3 answers

Can not override Talend job context parameters when launching from the command-line

I am currently trying to run a Talend job from the command line. Since my production environment parameters are different from what I have on my local workstation, I have to provide context parameters when launching the job on the target…
kaffein
  • 1,766
  • 2
  • 28
  • 54
3
votes
1 answer

Programmatic data conversion strategy

I have a product that imports certain data files from clients (ie: user directories, etc), and will export other types of data (ie: reports, etc). All import and exports are in currently in CSV format (rfc4180), and files are passed back and forth…
rcourtna
  • 4,589
  • 5
  • 26
  • 27
3
votes
0 answers

Pentaho executing SQL scripts to insert data

I am working on a report that will give a list of missing sequences using imported data: CREATE TABLE `client_trans` ( `id` INT NOT NULL AUTO_INCREMENT, `client_id` INT NULL, `sequence` INT NULL, `other_data` INT NULL, PRIMARY…
MSwart
  • 31
  • 1
  • 3
3
votes
2 answers

Casting date in Talend Data Integration

In a data flow from one table to another, I would like to cast a date. The date leaves the source table as a string in this format: "2009-01-05 00:00:00:000 + 01:00". I tried to convert this to a date using a tConvertType, but that is not allowed…
Paul Maclean
  • 631
  • 4
  • 14
  • 31
3
votes
3 answers

SQL identity column insert using pentaho data integration

I am new to Pentaho data integration tool.I am trying to move data from a source table into target table ... both is SQL Server. The tables are identical and has an identity column. Tried many options but ... it gives an error every time saying…
UnlimitedMeals
  • 131
  • 1
  • 4
  • 10
1
2
3
22 23