Questions tagged [datastage]

DataStage is the ETL (Extract, Transform, Load) component of the IBM InfoSphere Information Server suite. It allows the user to integrate various data sources and targets in an enterprise environment as a GUI based client tool.

DataStage is the ETL (Extract, Transform, Load) component of the IBM InfoSphere Information Server suite. It allows the user to integrate various data sources and targets in an enterprise environment as a GUI based client tool. Data Sources/Targets could be database tables, flat files, datasets, csv files etc. Basic design paradigm consists of a unit of work called as DataStage job. Multiple jobs can be controlled and conditionally sequenced using 'Sequences'.

IBM® InfoSphere® DataStage® integrates data across multiple systems using a high performance parallel framework, and it supports extended metadata management and enterprise connectivity. The scalable platform provides more flexible integration of all types of data, including big data at rest (Hadoop-based) or in motion (stream-based), on distributed and mainframe platforms.

Read more here

InfoSphere DataStage provides these features and benefits:

  • Powerful, scalable ETL platform
  • Support for big data and Hadoop
  • Near real-time data integration
  • Workload and business rules management
  • Ease of use

Support for big data and Hadoop

  • Includes support for IBM InfoSphere BigInsights, Cloudera, Apache and Hortonworks Hadoop Distributed File System (HDFS).
  • Offers Balanced Optimization for Hadoop capabilities to push processing to the data and improve efficiency.
  • Supports big-data governance including features such as impact analysis and data lineage

Powerful, scalable ETL platform

  • Manages data arriving in near real-time as well as data received on a periodic or scheduled basis.

  • Provides high-performance processing of very large data volumes.

  • Leverages the parallel processing capabilities of multiprocessor hardware platforms to help you manage growing data volumes and shrinking batch windows.

  • Supports heterogeneous data sources and targets in a single job including text files, XML, ERP systems, most databases (including partitioned databases), web services, and business intelligence tools.

Near real-time data integration

  • Captures messages from Message Oriented Middleware (MOM) queues using Java Message Services (JMS) or WebSphere MQ adapters, allowing you to combine data into conforming operational and historical analysis perspectives.

  • Provides a service-oriented architecture (SOA) for publishing data integration logic as shared services that can be reused over the enterprise.

  • Can simultaneously support high-speed, high reliability requirements of transactional processing and the large volume bulk data requirements of batch processing.

Ease of use

  • Includes an operations console and interactive debugger for parallel jobs to help you enhance productivity and accelerate problem resolution.

  • Helps reduce the development and maintenance cycle for data integration projects by simplifying administration and maximizing development resources.

  • Offers operational intelligence capabilities, smart management of metadata and metadata imports, and parallel debugging capabilities to help enhance productivity when working with partitioned data.

609 questions
1
vote
2 answers

DataStage. Change some columns using 2 files

I have two source files. They both have the nearly same layout. I have to match FILE 1 column A with the FILE 2 column A (pretty much a left outer join). If it matched, the FILE 1 columns G,H and I, have to get the same columns of FILE 2. If it…
1
vote
4 answers

what is the equivalent of datastage convert function in DB2

Convert function return a copy of variable with every occurrence of specified characters in variable replaced with other specified characters. Every time a character to be converted appears in variable, it is replaced by the replacement…
BHARATH RAJ
  • 69
  • 1
  • 10
1
vote
1 answer

Selecting appropriate tool for replacement of IBM DataStage ETL tool

we are looking for replacement of existing IBM DataStage platform . it has had around 1500 + mappings/datastage jobs on-premise . these mapping have also some complex transformations and mappings. It is a complete ETL architecture on-premise. If it…
Mangesh
  • 11
  • 1
1
vote
1 answer

Is there any possibility if the stage variable conversion is failed then capture the data into reject file

We have a stage variable using DateFromDaysSince(Date Column) in datastage transformer. Due to some invalid dates , datastage job is getting failed . We have source with oracle. When we check the dates in table we didnt find any issue but while…
Arya
  • 528
  • 6
  • 26
1
vote
1 answer

How to get month name from date in datastage?

I have added example as below : input : 31-12-2019 output :31-dec-2019 How to get the month name from month number in Data stage? Is there any function in data stage to do it?
1
vote
1 answer

Differences between DataStage 11.3 vs 11.7

I want to migrate DataStage projects 11.3 to 11.7, I was wondering what problems there may be at the time of migration.
1
vote
2 answers

trim Leading and trailing spaces without using transform stage in datastage

I am trying to remove leading and trailing spaces in datastage. In Transform stage we can use TrimLeadingTrailing(ID) to achieve this. But without using transform stage , i am trying to use Modify stage . Below is the code id = string_trim["…
avinash
  • 119
  • 1
  • 13
1
vote
1 answer

How to get the column names along with the rows by using pivot stage in datastage

I have a table in format Data: ID Sev1 Sev2 Sev3 ABC 0.45 1 1 PQR 0.45 1 2 XYZ 0.45 1 1 I want to change this to the new format as below by using horizontal pivot . How can i get column names ( severity ) as well along with its data Expected…
avinash
  • 119
  • 1
  • 13
1
vote
2 answers

Conversion of Normal Date ( YYYY-MM-DD ) to Julian Date conversion in datastage

Is there any function to convert Normal Date to Julian Date. I have used JulianDayFromDate function in transformer but i am not getting expected output . Sample Input : Date -- 2013-02-02 Output Should be: Julian Date-- 113033 ( In Database we…
avinash
  • 119
  • 1
  • 13
1
vote
1 answer

How to call a shell script multiple times in parallel from another application

I am working with Datastage which uses a Command Executable Stage to call a parameterized Shell Script. This question is not about datastage. it's about how to call the shell script. Right now, The logic of datastage is to call the script three…
arcee123
  • 101
  • 9
  • 41
  • 118
1
vote
1 answer

IBM Datastage reports failure code 262148

I realize this is a bad question, but I don't know where else to turn. can someone point me to where I can find the list of reports failure codes for IBM? I've tried searching for it in the IBM documentation, and in general google search, but this…
arcee123
  • 101
  • 9
  • 41
  • 118
1
vote
1 answer

How to export a Datastage job in UNIX

How to export a Datastage job in Unix Machine, I have tried using the following tools. istool used only for the .isx format. dsexport is used for windows client. Is there any possibility to export a job to .dsx in UNIX machine
chintuyadavsara
  • 1,509
  • 1
  • 12
  • 23
1
vote
1 answer

Can we create and use Temporary tables in DataStage?

Is it possible to create and/or use temporary tables in the ODBC connector stage of the DataStage? I'm trying to update the data using a #Temp table in the join statement immediately after populating the Temp table. I had looked according to the…
Adithya Alapati
  • 191
  • 1
  • 12
1
vote
0 answers

Load compressed data from Amazon S3 to Postgres using datastage

I am trying to load data which is stored in .gz format in S3 to PostgreSQL server using Datastage. I am using the ODBC connector on the target (database) side. I am able to load uncompressed data from S3 to PostgreSQL but no luck with compressed…
devd
  • 370
  • 10
  • 28
1
vote
0 answers

Fatal Error: Added field has duplicate identifier(): APT_TRinput0Rec99 (ALR_DATIBAS3.FilterFieldError)

I have a job with 181 columns, I'm getting this error while compiling on a transformer before a funnel. Fatal Error: Added field has duplicate identifier(): APT_TRinput0Rec99 (ALR_DATIBAS3.FilterFieldError) The transformer has 181 constraints and…