Questions tagged [datastage]

DataStage is the ETL (Extract, Transform, Load) component of the IBM InfoSphere Information Server suite. It allows the user to integrate various data sources and targets in an enterprise environment as a GUI based client tool.

DataStage is the ETL (Extract, Transform, Load) component of the IBM InfoSphere Information Server suite. It allows the user to integrate various data sources and targets in an enterprise environment as a GUI based client tool. Data Sources/Targets could be database tables, flat files, datasets, csv files etc. Basic design paradigm consists of a unit of work called as DataStage job. Multiple jobs can be controlled and conditionally sequenced using 'Sequences'.

IBM® InfoSphere® DataStage® integrates data across multiple systems using a high performance parallel framework, and it supports extended metadata management and enterprise connectivity. The scalable platform provides more flexible integration of all types of data, including big data at rest (Hadoop-based) or in motion (stream-based), on distributed and mainframe platforms.

Read more here

InfoSphere DataStage provides these features and benefits:

  • Powerful, scalable ETL platform
  • Support for big data and Hadoop
  • Near real-time data integration
  • Workload and business rules management
  • Ease of use

Support for big data and Hadoop

  • Includes support for IBM InfoSphere BigInsights, Cloudera, Apache and Hortonworks Hadoop Distributed File System (HDFS).
  • Offers Balanced Optimization for Hadoop capabilities to push processing to the data and improve efficiency.
  • Supports big-data governance including features such as impact analysis and data lineage

Powerful, scalable ETL platform

  • Manages data arriving in near real-time as well as data received on a periodic or scheduled basis.

  • Provides high-performance processing of very large data volumes.

  • Leverages the parallel processing capabilities of multiprocessor hardware platforms to help you manage growing data volumes and shrinking batch windows.

  • Supports heterogeneous data sources and targets in a single job including text files, XML, ERP systems, most databases (including partitioned databases), web services, and business intelligence tools.

Near real-time data integration

  • Captures messages from Message Oriented Middleware (MOM) queues using Java Message Services (JMS) or WebSphere MQ adapters, allowing you to combine data into conforming operational and historical analysis perspectives.

  • Provides a service-oriented architecture (SOA) for publishing data integration logic as shared services that can be reused over the enterprise.

  • Can simultaneously support high-speed, high reliability requirements of transactional processing and the large volume bulk data requirements of batch processing.

Ease of use

  • Includes an operations console and interactive debugger for parallel jobs to help you enhance productivity and accelerate problem resolution.

  • Helps reduce the development and maintenance cycle for data integration projects by simplifying administration and maximizing development resources.

  • Offers operational intelligence capabilities, smart management of metadata and metadata imports, and parallel debugging capabilities to help enhance productivity when working with partitioned data.

609 questions
1
vote
2 answers

Bulk edit DataStage jobs?

We are repointing a large number (>1000) DataStage jobs from one database to another. As part of this, we will need to make the same changes to a single stage for many jobs. So far, we have been able to export jobs to XML, edit and reimport. This…
quannabe
  • 367
  • 1
  • 4
  • 11
1
vote
2 answers

How to remove decimal point in DataStage?

I would like to remove the decimal point from the column. For example, D21.3 would become D213 Someone help me, please.
Krishna
  • 25
  • 4
1
vote
0 answers

DataStage Java Code - How to test process() method without DataStage in the picture

I am new to the DataStage world and I am trying to start the process() method by myself. "Why do you want to do that?" Sadly, I have not the hands on DataStage directly, I am "just" the Java developer in charge of creating the Java classes that will…
1
vote
0 answers

Special character conversion issue in Datastage

In Datastage, we have source system as Oracle and target system as Netezza. In Oracle the column datatype is varchar whereas in Netezza it is nvarchar. Most of the characters are Latin and Dutch. We are getting character in our table row which is…
PPK
  • 55
  • 7
1
vote
0 answers

Converting string date from one timezone to UTC in Datastage

I would like to convert dates I receive from timezone 'Europe/Warsaw' to UTC. I have tried to find suitable libraries in C to create a Datastage routine. I did not find anything that would allow to flexibly change the source timezone keeping in mind…
Mr Lewy
  • 11
  • 2
1
vote
1 answer

How can I aggregate string data in DataStage?

I have the following data coming into DataStage: customernumber hometelephone mobiletelephone 1234 NULL 07123456 1234 0120202 NULL What I want out the other end…
Richard Hansell
  • 5,315
  • 1
  • 16
  • 35
1
vote
0 answers

In ETL, How to handle new data inserted in Source DB with past Timestamp?

We have a DWH that is connected to several sources DB's. We recently faced an issue where one of the sources inserted a new set of records with Timestamp that is in the past (not the actual Timestamp of the insertion to their DB). We use the…
Meshal
  • 13
  • 4
1
vote
2 answers

How to connect Amazon S3 to IBM datastage server which is hosted on premise

I have IBM Datastage server installed on premises. I want to connect to an Amazon S3 bucket from datastage to load data. How can i establish a connection to Amazon S3 from datastage server.
Db2Cramp
  • 33
  • 7
1
vote
1 answer

Can we generate a data lineage from our DataStage Jobs?

We're using IBM DataStage 11.7.1 The metadata asset manager was not used in the Project. Can we generate a data lineage out of the existing and used jobs (knowing that not 100% can be covered)? If yes: how?
Justus Kenklies
  • 440
  • 3
  • 10
1
vote
1 answer

Execute Datastage Data Flow Designer REST API

I'm trying to use the REST API function to compile a job. Following is the syntax used.URL was executed directly in a browser and gave user and password when the pop-up asked for…
Jojames
  • 11
  • 1
1
vote
1 answer

Error in opening the hierarchical stage : Flash Player Error

Opening the Datastage hierarchical stage we have this error : Flash Player Error. This application requires an Adobe Flash Player ActiveX control of version 10 or later. Get Flash Time ago, we modified the mms.cfg and it was…
Gius
  • 21
  • 1
  • 7
1
vote
0 answers

DataStage- How to Parameterize a file Pattern in a sequence job using user variable stage and execute command stage

I have to parameterize a list of files of a specific pattern For Eg: the pattern of the files is given as test_t?data????????.txt I have to parameterize this file name in datastage using user var stage and execute command stage for every time I run…
sreerag
  • 11
  • 2
1
vote
2 answers

How to start a DataStage Sequence job when a when file comes to the server

I’m looking to build a process that triggers a DataStage Sequencer job when any file comes to the server’s landing zone. CA7 is the scheduler and the file naming convention comes in many different flavors, including the file extensions. Also, some…
tbtcust
  • 65
  • 6
1
vote
1 answer

DataStage Transformer stage how to check if explicit text is in the column's values

Currently, I am using the Count function inside a Transformer stage in a parallel job to check if the values of 1 Stage Variable (StageVar) contain some explicit value then give another column (Code) some values. There's so many Code to check in…
UglyPrince
  • 37
  • 8
1
vote
1 answer

Looking for a way to automate mapping user credentials for DataStage users

We currently follow a slightly modified series of steps--called mapping user credentials--to give a user access to DataStage. We have to follow these steps on multiple servers. There are a lot of shell scripts and binaries in the installed…
harleypig
  • 1,264
  • 8
  • 25