Questions tagged [datastage]

DataStage is the ETL (Extract, Transform, Load) component of the IBM InfoSphere Information Server suite. It allows the user to integrate various data sources and targets in an enterprise environment as a GUI based client tool.

DataStage is the ETL (Extract, Transform, Load) component of the IBM InfoSphere Information Server suite. It allows the user to integrate various data sources and targets in an enterprise environment as a GUI based client tool. Data Sources/Targets could be database tables, flat files, datasets, csv files etc. Basic design paradigm consists of a unit of work called as DataStage job. Multiple jobs can be controlled and conditionally sequenced using 'Sequences'.

IBM® InfoSphere® DataStage® integrates data across multiple systems using a high performance parallel framework, and it supports extended metadata management and enterprise connectivity. The scalable platform provides more flexible integration of all types of data, including big data at rest (Hadoop-based) or in motion (stream-based), on distributed and mainframe platforms.

Read more here

InfoSphere DataStage provides these features and benefits:

  • Powerful, scalable ETL platform
  • Support for big data and Hadoop
  • Near real-time data integration
  • Workload and business rules management
  • Ease of use

Support for big data and Hadoop

  • Includes support for IBM InfoSphere BigInsights, Cloudera, Apache and Hortonworks Hadoop Distributed File System (HDFS).
  • Offers Balanced Optimization for Hadoop capabilities to push processing to the data and improve efficiency.
  • Supports big-data governance including features such as impact analysis and data lineage

Powerful, scalable ETL platform

  • Manages data arriving in near real-time as well as data received on a periodic or scheduled basis.

  • Provides high-performance processing of very large data volumes.

  • Leverages the parallel processing capabilities of multiprocessor hardware platforms to help you manage growing data volumes and shrinking batch windows.

  • Supports heterogeneous data sources and targets in a single job including text files, XML, ERP systems, most databases (including partitioned databases), web services, and business intelligence tools.

Near real-time data integration

  • Captures messages from Message Oriented Middleware (MOM) queues using Java Message Services (JMS) or WebSphere MQ adapters, allowing you to combine data into conforming operational and historical analysis perspectives.

  • Provides a service-oriented architecture (SOA) for publishing data integration logic as shared services that can be reused over the enterprise.

  • Can simultaneously support high-speed, high reliability requirements of transactional processing and the large volume bulk data requirements of batch processing.

Ease of use

  • Includes an operations console and interactive debugger for parallel jobs to help you enhance productivity and accelerate problem resolution.

  • Helps reduce the development and maintenance cycle for data integration projects by simplifying administration and maximizing development resources.

  • Offers operational intelligence capabilities, smart management of metadata and metadata imports, and parallel debugging capabilities to help enhance productivity when working with partitioned data.

609 questions
1
vote
1 answer

DataStage meta data information

I have data stage version 7.5.1. Now I want to extract meta data information related to all jobs executed in data stage like job start time,end time,status and failure reason in case of job failure. If some body can help highly appreciated.
1
vote
4 answers

Running SQLLDR in DataStage

I was wondering, for folks familiar with DataStage, if Oracle SQLLDR can be used on DataStage. I have some sets of control files that I would like to incorporate into DataStage. A step by step way of accomplishing this will greatly be appreciated.…
user2008558
  • 341
  • 5
  • 16
1
vote
1 answer

IBM Data Stage - How to find database tables used in jobs

For a project we need to investigate an existing installation of IBM Data Stage, doing a whole lot of ETL in loads of jobs. The job flow diagrams contain lots of tables being used a source (both in MSSQL as well as Oracle), as well as a target…
Stefan
  • 99
  • 1
  • 5
1
vote
1 answer

DataStage 9.1 multi instance job log not organizing log by invocation id

Hey guys hope you might be able to help. I am using DataStage 9.1 and I am having an issue with the job log in Director. Let me first say the company that I work for just bought and installed InfoSphere about 6 months ago so I am fully expecting…
R.Hamilton
  • 21
  • 1
  • 6
1
vote
1 answer

Datastage Job terminating due to the following error

I'm running a data stage job, Input through DB2 and output to DB2. Input side has a query containing joins and functions. I'm getting the following warning message; TRN_HEALTH_INSURANCE_DETAIL, 2: STATEMENT INSERT INTO HEALTH_INSURANCE_DETAIL ( …
Nuh
  • 27
  • 3
  • 9
1
vote
1 answer

How to keep a log of execution in datastage?

good afternoon I've been searching the net how to do this but I can not find the solution: I want to keep records of execution of each of the jobs in flat files. Any idea how to do this?? I am using DataStage 8.5
jackajack
  • 153
  • 1
  • 1
  • 11
1
vote
1 answer

How to add one day in date in datastage server

How to add one day in VDate column where column VDate value is not fixed. Example: input VDate = "2013-02-07" output VDate = "2013-02-08" Please suggest.
Pooja
  • 165
  • 4
  • 14
1
vote
1 answer

DataStage DB2 Runtime Column Propagation with Unicode Data

I'm trying to read some data from DB2 9.7 using DataStage 9.1 using the Runtime Column Propagation (RCP) feature. The general approach is to set the DB2 Connector Stage to generate the query SQL by specifying only the connection details and table…
Bryan Kyle
  • 13,361
  • 4
  • 40
  • 45
1
vote
1 answer

How to select desired columns in Datastage when doing an direct extract-load?

How do I select desired columns from one flat file to another flat file using Datastage. I have a source file containing two fields called NAME and ROLL_NO. Now i need to select only NAME field to my target flat file using Datastage with out using…
Sukumar Bera
  • 59
  • 2
  • 8
1
vote
1 answer

ParamValue/Limitvalue is not appropriate error while using trim function in datastage

I am using Trim function in user variable activity in DS 7.5 Trim(Trim(ExCmd_EmailReptType.$CommandOutput,"*","L"),"#","T") But job is aborting and showing an error as "Error calling DSSetParam(prmCNIRTP), code=-4 [ParamValue/Limitvalue is not…
Pooja
  • 165
  • 4
  • 14
1
vote
2 answers

A leading question mark in oracle using datastage to import from text to oracle?

The question mark "?" appears only in the front of the first field of the first row to insert. For once, I changed the ftp upload file type to text/ascii (rather than binary) and it seemed resolve the problem. But later it came back. The server OS…
Kurt_Zhu
  • 47
  • 5
1
vote
1 answer

Where are the C++ compiler folders LIB and INCLUDE located?

I'm trying to set C++ compiler for IBM DataStage ETL tool. I installed Microsoft Visual C++ 2008 Redistributable (x64) for my Windows Server 2008 R2. The DataStage guide says that Visual Studio .NET 2008 Express Edition C++: Set the LIB…
jrara
  • 16,239
  • 33
  • 89
  • 120
1
vote
1 answer

How to capture Meta Data Info of Data sources used in DataStage

The platform is IBM datastage 8.1. We don't have access to DataStage. We can only get .dsx/xml exported files. Now we need to capture metadata information of which databases/tables(source) are transformed to another databases/tables(target). Once…
user975828
  • 147
  • 1
  • 2
  • 10
1
vote
1 answer

How to transform a string into timestamp in DataStage?

I read data from a csv file,and I get a string like "2010-7-3", I can't transform this data into timestamp,because it not like "2010-07-03" What should I do?Is there a stage could handle this?
wtm
  • 1,389
  • 4
  • 13
  • 18
0
votes
3 answers

DATASTAGE: how to run more instance jobs in parallel using DSJOB

I have a question. I want to run more instance of same job in parallel from within a script: I have a loop in which I invoke jobs with dsjob and without option "-wait" and "-jobstatus". I want that jobs completed before script termination, but I…
sangi
  • 511
  • 3
  • 13
  • 25