Questions tagged [datastage]

DataStage is the ETL (Extract, Transform, Load) component of the IBM InfoSphere Information Server suite. It allows the user to integrate various data sources and targets in an enterprise environment as a GUI based client tool.

DataStage is the ETL (Extract, Transform, Load) component of the IBM InfoSphere Information Server suite. It allows the user to integrate various data sources and targets in an enterprise environment as a GUI based client tool. Data Sources/Targets could be database tables, flat files, datasets, csv files etc. Basic design paradigm consists of a unit of work called as DataStage job. Multiple jobs can be controlled and conditionally sequenced using 'Sequences'.

IBM® InfoSphere® DataStage® integrates data across multiple systems using a high performance parallel framework, and it supports extended metadata management and enterprise connectivity. The scalable platform provides more flexible integration of all types of data, including big data at rest (Hadoop-based) or in motion (stream-based), on distributed and mainframe platforms.

Read more here

InfoSphere DataStage provides these features and benefits:

  • Powerful, scalable ETL platform
  • Support for big data and Hadoop
  • Near real-time data integration
  • Workload and business rules management
  • Ease of use

Support for big data and Hadoop

  • Includes support for IBM InfoSphere BigInsights, Cloudera, Apache and Hortonworks Hadoop Distributed File System (HDFS).
  • Offers Balanced Optimization for Hadoop capabilities to push processing to the data and improve efficiency.
  • Supports big-data governance including features such as impact analysis and data lineage

Powerful, scalable ETL platform

  • Manages data arriving in near real-time as well as data received on a periodic or scheduled basis.

  • Provides high-performance processing of very large data volumes.

  • Leverages the parallel processing capabilities of multiprocessor hardware platforms to help you manage growing data volumes and shrinking batch windows.

  • Supports heterogeneous data sources and targets in a single job including text files, XML, ERP systems, most databases (including partitioned databases), web services, and business intelligence tools.

Near real-time data integration

  • Captures messages from Message Oriented Middleware (MOM) queues using Java Message Services (JMS) or WebSphere MQ adapters, allowing you to combine data into conforming operational and historical analysis perspectives.

  • Provides a service-oriented architecture (SOA) for publishing data integration logic as shared services that can be reused over the enterprise.

  • Can simultaneously support high-speed, high reliability requirements of transactional processing and the large volume bulk data requirements of batch processing.

Ease of use

  • Includes an operations console and interactive debugger for parallel jobs to help you enhance productivity and accelerate problem resolution.

  • Helps reduce the development and maintenance cycle for data integration projects by simplifying administration and maximizing development resources.

  • Offers operational intelligence capabilities, smart management of metadata and metadata imports, and parallel debugging capabilities to help enhance productivity when working with partitioned data.

609 questions
1
vote
1 answer

ODBC Config File for Datastage Connection to SQLServer 2008

I have an odbc config file on a sun solaris server, used for IBM datastage. We need to connect to a sqlserver express edition. Ip used to connect is xxx.xxx.xxx.xxx\TARGET port is 1433, database is dbname. Sample of config file is: …
sangi
  • 511
  • 3
  • 13
  • 25
1
vote
2 answers

Datastage: how to improve the performance load data from oracle to sql server

The platform is IBM Datastage 8.1 RHEL4 16G MEM,4CPU16CORE. When I try to create a job to load data from Oracle to SQL Server the job is running correctly, but slowly. The row count from the source table in Oracle is about 100,000,000 and the speed…
gobird
  • 81
  • 1
  • 7
1
vote
3 answers

Calling WCF service from Datastage - Output to XML file

I have developed a WCF service that returns data serializable objects as [DataContracts]. Other folks in my organization wish to call this web services using DataStage and have it output the response to an XML file. We are able to reference the…
Borophyll
  • 1,099
  • 2
  • 15
  • 24
1
vote
4 answers

strlen inconsistent with zero length string

I'm creating a DataStage parallel routine, which is a C or C++ function that is called from within IBM (formerly Ascential) DataStage. It is failing if one of the strings passed in is zero length. If I put this at the very first line of the…
PhilHibbs
  • 859
  • 1
  • 13
  • 30
1
vote
1 answer

Lead Function to Perform in DataStage

I am looking to do LEAD() and LAG() functions to perform in DataStage. Input C1 C2 1 100 2 200 3 300 4 400 Output - 1 C1 C2 C3 1 100 200 2 200 300 3 300 400 4 400 NULL Output - 2 C1 C2 C3 1 100 NULL 2 200 100 3 300 200 4 400 300 Please help me…
1
vote
0 answers

Datastage integration with Source Repository tool

I am looking for help to integrate datastage ETL tool with any source repository tool in automated way. could you please let me know the repository tool name (for ex.- gitlab, github,bitbucket etc) which can directly integrate with Datastage and…
sourabh5
  • 11
  • 1
1
vote
1 answer

Unsure about InfoSphere 11.7 nulls handling

At work I've come across with an old process written in InfoSphere 11.7, I'm trying to migrate it to another framework but no one seems to know how it works (including me, of course). The ETL process takes as input a fixed length .txt UTF-8 file.…
imatiasmb
  • 113
  • 7
1
vote
1 answer

I want to copy job names in datastage designer

I want to copy job names in datastage designer It doesn't work using 'Ctrl+C' and 'Ctrl+V'. How can I copy job names? also In pallete too. Find in DATASTAGE Tutorial
gnc
  • 25
  • 2
1
vote
1 answer

How to aggregate strings in multiple rows in IBM InfoSphere DataStage grouped by a given ID

I am given a table like the following of a attendece of employees at a company. The data should be extracted from a sequenctial file which has comma seperated…
1
vote
0 answers

Datastage Link Name Placement

When designing DataStage Sequential and Parallel jobs, I will spend a fair amount of time organizing Stage Link Names so they are easy on the eyes and easily readable like this example: Organized Stage Link Names After Organizing, I will save and…
1
vote
1 answer

Is there any way to find datastage change history?

If someone change something like parameter, routine name...etc in designer Is there any way to find those history? and also Is datastage support debugging mode? (Because, I want to find which job occurs error) I can only see change history…
gnc
  • 25
  • 2
1
vote
1 answer

IBM Information Server engine start failed because of DSWLMServer is not enabled

When I executed the cmd:/opt/IBM/InformationServer/Server/DSEngine/bin/uv -admin -start,the process of DSWLMServer not started compared with another current server status, and I entered the path: /opt/InformationServer/Server/DSWLM,read the file…
1
vote
1 answer

Using attributes as input to db connector

In IBM DataStage, we have a connection that involves a Sequence File Connector, a Filter, and a DB2 Lookup stage. Is it possible to perform a DB2 Lookup with information from the Filter stage? For example, if the Filter stage has two records of…
william007
  • 17,375
  • 25
  • 118
  • 194
1
vote
1 answer

NTEXT on SQL Server to NVARCHAR2(2000) on Oracle (ORA-12899: value too large for column)

My source is in SQL Server and the target is Oracle. There are some tables having columns defined NTEXT in SQL Server and I created columns of NVARCHAR2(2000) which allows 4000 bytes, to store the data from the source. When I pull the data defined…
llearner
  • 37
  • 5
1
vote
1 answer

Config files in Datastage

We can have multiple config files in a project. Even we can run a parallel job on different config files. But can we run a parallel job on multiple config files at a time or can a parallel job use multiple config files at the same time?
1 2
3
40 41