Questions tagged [datastage]

DataStage is the ETL (Extract, Transform, Load) component of the IBM InfoSphere Information Server suite. It allows the user to integrate various data sources and targets in an enterprise environment as a GUI based client tool.

DataStage is the ETL (Extract, Transform, Load) component of the IBM InfoSphere Information Server suite. It allows the user to integrate various data sources and targets in an enterprise environment as a GUI based client tool. Data Sources/Targets could be database tables, flat files, datasets, csv files etc. Basic design paradigm consists of a unit of work called as DataStage job. Multiple jobs can be controlled and conditionally sequenced using 'Sequences'.

IBM® InfoSphere® DataStage® integrates data across multiple systems using a high performance parallel framework, and it supports extended metadata management and enterprise connectivity. The scalable platform provides more flexible integration of all types of data, including big data at rest (Hadoop-based) or in motion (stream-based), on distributed and mainframe platforms.

Read more here

InfoSphere DataStage provides these features and benefits:

  • Powerful, scalable ETL platform
  • Support for big data and Hadoop
  • Near real-time data integration
  • Workload and business rules management
  • Ease of use

Support for big data and Hadoop

  • Includes support for IBM InfoSphere BigInsights, Cloudera, Apache and Hortonworks Hadoop Distributed File System (HDFS).
  • Offers Balanced Optimization for Hadoop capabilities to push processing to the data and improve efficiency.
  • Supports big-data governance including features such as impact analysis and data lineage

Powerful, scalable ETL platform

  • Manages data arriving in near real-time as well as data received on a periodic or scheduled basis.

  • Provides high-performance processing of very large data volumes.

  • Leverages the parallel processing capabilities of multiprocessor hardware platforms to help you manage growing data volumes and shrinking batch windows.

  • Supports heterogeneous data sources and targets in a single job including text files, XML, ERP systems, most databases (including partitioned databases), web services, and business intelligence tools.

Near real-time data integration

  • Captures messages from Message Oriented Middleware (MOM) queues using Java Message Services (JMS) or WebSphere MQ adapters, allowing you to combine data into conforming operational and historical analysis perspectives.

  • Provides a service-oriented architecture (SOA) for publishing data integration logic as shared services that can be reused over the enterprise.

  • Can simultaneously support high-speed, high reliability requirements of transactional processing and the large volume bulk data requirements of batch processing.

Ease of use

  • Includes an operations console and interactive debugger for parallel jobs to help you enhance productivity and accelerate problem resolution.

  • Helps reduce the development and maintenance cycle for data integration projects by simplifying administration and maximizing development resources.

  • Offers operational intelligence capabilities, smart management of metadata and metadata imports, and parallel debugging capabilities to help enhance productivity when working with partitioned data.

609 questions
1
vote
3 answers

Identification of Datastage latest modified jobs

How to identify the jobs which are modified today in datastage 8.1. Thanks Raghu
user455818
  • 31
  • 1
  • 3
1
vote
2 answers

compare current value with previous value in datastage

i have input like below empid salary 10 1000 20 2000 30 3000 40 4000 the output i require in a sequential fie is like below. that is prevsal should have the salary of the previous row empid salary prevsal 10 1000 …
Sundararaman P
  • 351
  • 1
  • 7
  • 12
1
vote
1 answer

What Programming Language is this? I want to pass parameters to this Datastage Job. How to do it?

I am a Java developer and I suddenly had to move to a Datastage Support role for a short time. I need to run batch Jobs and need to modify few scripts. Can anyone tell me what language is this? FILE_TRIGGER trigger.name AGENT aname RESOURCE ADD…
JavaBits
  • 2,005
  • 11
  • 36
  • 40
1
vote
1 answer

How to call UDF in DataStage?

I want to call an Oracle user defined function through DataStage jobs. That function will return value that will update the target table. I have tried using Stored Procedure stage but I was not able to map function parameters through this…
1
vote
1 answer

Job parameter from file on server in datastage

Hello Datastage developers, I'm pretty new to the tool. I'm trying to develop a parallel job with Oracle stage. I need the database parameters to be populated at run time. I see there are jobs designed for our project which take these parameters(DB…
Mohammed Rafi
  • 11
  • 1
  • 2
1
vote
1 answer

can a modify stage bulk rename columns based on wildcards?

I need to modify columns based on business rules using RCP. For example, all source columns that end with '_ID' must be changed to '_KEY' to meet the target. An example: Test_ID in source becomes Test_KEY in target I have multiple tables, some…
arcee123
  • 101
  • 9
  • 41
  • 118
1
vote
2 answers

Datastage, Remove only last two characters of string

This function: Trim(In.Col, Right(In.Col, 2), 'T') works unless the last >2 characters are the same. What I want: abczzzz -> abczz What I get: abczzzz -> abc How do I solve this?
paulstomb
  • 21
  • 1
  • 4
1
vote
1 answer

In DataStage, how do you extract an element together with a list of elements from an XML file

so I've spent hours trying to figure this out. I'm basically trying to read an xml document (using the Hierarchical Data stage). Then I need to output the contents of that document into a dataset with two columns. The difficulty is that in the xml…
LatinCanuck
  • 454
  • 2
  • 10
  • 29
1
vote
1 answer

function statement error in datastage server routine (BASIC). Very simple but didn't working

I have problem with using datastage server routine function statement. Would you please notice me to what is problem with my code? It is simple but I don't know why didn't work... The code is $INCLUDE DSINCLUDE JOBCONTROL.H FUNCTION GetTS(A,B) …
gadiz
  • 31
  • 1
  • 2
  • 8
1
vote
2 answers

Removing duplicates using PigLatin and retaining the last element

I am using PigLatin. And I want to remove the duplicates from the bags and want to retain the last element of the particular key. Input: User1 7 LA User1 8 NYC User1 9 NYC User2 3 NYC User2 4 DC Output: User1 9 NYC User2 4 DC Here…
Anil Savaliya
  • 129
  • 1
  • 1
  • 6
1
vote
2 answers

How to save the if then else output in a variable unix

I have a variable(stageVar) getting from Datastage, and need to check whether that variable is equal to zero then replace the 100 else stageVar. After that I need to find the mod and save in variable. I have tried the below code but am not…
Bobby
  • 320
  • 5
  • 23
1
vote
2 answers

Nested If-Then-Else usage in DataStage

I am trying to write the following nested If-Then-else statements in Transformation Stage of DataStage but its giving me compilation error. Can anybody tell me is there any other way of doing this? If IsNotNull(DSLink16.DECISION_ID) Then ( If…
Prerak Tiwari
  • 3,436
  • 4
  • 34
  • 64
1
vote
0 answers

lookup stage in datastage 8.5 with multiple rows returned from link

I am trying to pull XML data from Lookup stage. I have multiple key values coming from the reference table. I selected the link name from the Multiple rows returned from link drop-down list. I am getting the output in two different…
Janaki
  • 11
  • 1
1
vote
0 answers

InfoSphere 11.3.1 Repository

Has anyone got any experience using a database othe than DB2 for an IBM InfoSphere 11.3.1 repository? In 11.3.1, IBM added support for Oracle and SQL Server databases as the repository, and I'm curious to know what pros/cons other's may have run…
mnewman
  • 11
  • 3
1
vote
2 answers

Datastage. How to manage output of lookup stage?

I have two source files File A and File B, i have to use 3 different lookup stages for 3 different conditions those conditions are deriving new column each in each lookup used, i want those new derived columns in my final output with other columns…