Questions tagged [datastage]

DataStage is the ETL (Extract, Transform, Load) component of the IBM InfoSphere Information Server suite. It allows the user to integrate various data sources and targets in an enterprise environment as a GUI based client tool.

DataStage is the ETL (Extract, Transform, Load) component of the IBM InfoSphere Information Server suite. It allows the user to integrate various data sources and targets in an enterprise environment as a GUI based client tool. Data Sources/Targets could be database tables, flat files, datasets, csv files etc. Basic design paradigm consists of a unit of work called as DataStage job. Multiple jobs can be controlled and conditionally sequenced using 'Sequences'.

IBM® InfoSphere® DataStage® integrates data across multiple systems using a high performance parallel framework, and it supports extended metadata management and enterprise connectivity. The scalable platform provides more flexible integration of all types of data, including big data at rest (Hadoop-based) or in motion (stream-based), on distributed and mainframe platforms.

Read more here

InfoSphere DataStage provides these features and benefits:

  • Powerful, scalable ETL platform
  • Support for big data and Hadoop
  • Near real-time data integration
  • Workload and business rules management
  • Ease of use

Support for big data and Hadoop

  • Includes support for IBM InfoSphere BigInsights, Cloudera, Apache and Hortonworks Hadoop Distributed File System (HDFS).
  • Offers Balanced Optimization for Hadoop capabilities to push processing to the data and improve efficiency.
  • Supports big-data governance including features such as impact analysis and data lineage

Powerful, scalable ETL platform

  • Manages data arriving in near real-time as well as data received on a periodic or scheduled basis.

  • Provides high-performance processing of very large data volumes.

  • Leverages the parallel processing capabilities of multiprocessor hardware platforms to help you manage growing data volumes and shrinking batch windows.

  • Supports heterogeneous data sources and targets in a single job including text files, XML, ERP systems, most databases (including partitioned databases), web services, and business intelligence tools.

Near real-time data integration

  • Captures messages from Message Oriented Middleware (MOM) queues using Java Message Services (JMS) or WebSphere MQ adapters, allowing you to combine data into conforming operational and historical analysis perspectives.

  • Provides a service-oriented architecture (SOA) for publishing data integration logic as shared services that can be reused over the enterprise.

  • Can simultaneously support high-speed, high reliability requirements of transactional processing and the large volume bulk data requirements of batch processing.

Ease of use

  • Includes an operations console and interactive debugger for parallel jobs to help you enhance productivity and accelerate problem resolution.

  • Helps reduce the development and maintenance cycle for data integration projects by simplifying administration and maximizing development resources.

  • Offers operational intelligence capabilities, smart management of metadata and metadata imports, and parallel debugging capabilities to help enhance productivity when working with partitioned data.

609 questions
-1
votes
1 answer

Filtering data using constraints in Datastage Transformer

In Datastage, I have an INTEGER field from a Seq File 0, in a transformer i wanted to write a constraint that if the Source data from the seq file is <> 0 or <> Numeric (numbers) then it should be written in seq file 2 and other Numeric data and 0…
Priyan
  • 35
  • 6
-1
votes
1 answer

Connection Problem of Xmeta Database on Datastage

I have a problem about connection of the db2 connector to metadata repository database (XMETA- inside datastage- not remote). I can connect to XMETA database on terminal with db2inst1 user. But I can not connect with DB2 Connector Stage with the…
-1
votes
1 answer

Kubernetes with calico

We are trying to set up IBM Datastage container makes use of two components Docker and Kubernetes. IBM used Kubernetes with Calico (pure IP networking fabric) as networking. IBM uses ansible + shell script to set up the deployment of InfoSphere…
paramagurus
  • 21
  • 1
  • 4
-1
votes
1 answer

Datastage: Split multiple sequential rows under different columns

I've got this type of data in my Database. Imagine that File_Name is the column name and so I need to take all the rows (Under "File_name") and put them into different columns with different Names. File_Name (Column Name) File1 (First Row) File2…
Jackwiper
  • 3
  • 5
-1
votes
1 answer

Datastage multiple parametric (conditionned) query execution

I would like to create a job than based on some values in Table A, execute a Select query in Table B where the WHERE CONDITION must be parametric. For example: I have 10 columns in A with 100 rows filled. 9 of my columns can be nullable so I have to…
Ahmed
  • 1
-1
votes
1 answer

To achieve an output from input in datastage tool

I have an input file with data GGN,IBM BNGLR, IBM GGN,HCL NOIDA,HCL BNGLR,HCL I want output like IBM,GGN,BNGLR HCL,GGN,NOIDA,BNGLR using datastage tool. Thanks in advance
-1
votes
2 answers

Datastage merge files

I need to merge 3 input files into 1 output file via datastage, may I know how to achieve this? Background: The 3 input files have different fields (layout) example: Input file A: HDR123 Input file B: 000123 Input file C: TRL003 Expected…
jwj
  • 1
  • 1
-1
votes
1 answer

How to run .bat file with parameters in command prompt? (Imporing Project Variables)

I am trying to process the file in below order please point me in right direction. Switch to the D drive by Entering: D: (Successful) Change to directory D:\IBM\InformationServer\ASBNode\bin. (Successful) Execute the processEnvVariables.bat for the…
-1
votes
3 answers

Is DataStage Merge stage just a left outer join with multiple other sources?

It appears that the DataStage Merge stage is just a left outer join with the Master being the "left" side and driving input. The other inputs are joined with the master when possible. Is that all there is to it? What am I missing?
lit
  • 14,456
  • 10
  • 65
  • 119
-1
votes
2 answers

How to make this simple .bat script on Windows Server 2012?

I try to run this simple script on Windows Serwer 2012, which works fine on Windows 7. For loop doesn't work, because parameter(%%A line of txt file) is not recognized in the statment. How I Can do this loop to works on Windows Server 2012? …
-1
votes
1 answer

Command to validate the connection between Datastage client and server

I am stuck in middle of a project where we are validating the installation of different softwares. My problem is that is there any proper commands for validating the connectivity between datastage client and server. Did some google and research but…
Deepu
  • 29
  • 1
  • 7
-1
votes
2 answers

Unix for Datastage

I have been working on Datastage for a while now. But Never got any chance to work on UNIX environment. I tried to search over the internet for some good learning resources of UNIX for Datastage, but didn't find any. Are there any good resources…
Tar.Ds
  • 11
  • 1
-1
votes
2 answers

found different MD5 hash value generated by java and datastage

I am trying to generate MD5 checksum value using java for a string "TREFFLAGDATAC000000EN", but for the same string the IBM InfoSphere DataStage is generating a differnt MD5 checksum value. Can anyone please direct me on how to generate the same MD5…
-1
votes
1 answer

Prevent script from outputitng status code

I wrote some ugly script of mine and besides giving me what I want in the output it also gives me the status code value. The script output is below. How do I prevent the script from showing that status code in the output?? P.S. I'll put the script…
Denys
  • 4,287
  • 8
  • 50
  • 80
-1
votes
1 answer

DataStage 9.1 Parametric table name based in input file

i have to develop a 9.1 DataStage ETL process in which the same logic is applied to different input files, output table... Based on the read input file fileA, fileB, fileC, I have to perform my job on the respective tables i.e. tableA, tableB,…
Nko
  • 341
  • 1
  • 7
  • 18
1 2 3
40
41