Questions tagged [datastage]

DataStage is the ETL (Extract, Transform, Load) component of the IBM InfoSphere Information Server suite. It allows the user to integrate various data sources and targets in an enterprise environment as a GUI based client tool.

DataStage is the ETL (Extract, Transform, Load) component of the IBM InfoSphere Information Server suite. It allows the user to integrate various data sources and targets in an enterprise environment as a GUI based client tool. Data Sources/Targets could be database tables, flat files, datasets, csv files etc. Basic design paradigm consists of a unit of work called as DataStage job. Multiple jobs can be controlled and conditionally sequenced using 'Sequences'.

IBM® InfoSphere® DataStage® integrates data across multiple systems using a high performance parallel framework, and it supports extended metadata management and enterprise connectivity. The scalable platform provides more flexible integration of all types of data, including big data at rest (Hadoop-based) or in motion (stream-based), on distributed and mainframe platforms.

Read more here

InfoSphere DataStage provides these features and benefits:

  • Powerful, scalable ETL platform
  • Support for big data and Hadoop
  • Near real-time data integration
  • Workload and business rules management
  • Ease of use

Support for big data and Hadoop

  • Includes support for IBM InfoSphere BigInsights, Cloudera, Apache and Hortonworks Hadoop Distributed File System (HDFS).
  • Offers Balanced Optimization for Hadoop capabilities to push processing to the data and improve efficiency.
  • Supports big-data governance including features such as impact analysis and data lineage

Powerful, scalable ETL platform

  • Manages data arriving in near real-time as well as data received on a periodic or scheduled basis.

  • Provides high-performance processing of very large data volumes.

  • Leverages the parallel processing capabilities of multiprocessor hardware platforms to help you manage growing data volumes and shrinking batch windows.

  • Supports heterogeneous data sources and targets in a single job including text files, XML, ERP systems, most databases (including partitioned databases), web services, and business intelligence tools.

Near real-time data integration

  • Captures messages from Message Oriented Middleware (MOM) queues using Java Message Services (JMS) or WebSphere MQ adapters, allowing you to combine data into conforming operational and historical analysis perspectives.

  • Provides a service-oriented architecture (SOA) for publishing data integration logic as shared services that can be reused over the enterprise.

  • Can simultaneously support high-speed, high reliability requirements of transactional processing and the large volume bulk data requirements of batch processing.

Ease of use

  • Includes an operations console and interactive debugger for parallel jobs to help you enhance productivity and accelerate problem resolution.

  • Helps reduce the development and maintenance cycle for data integration projects by simplifying administration and maximizing development resources.

  • Offers operational intelligence capabilities, smart management of metadata and metadata imports, and parallel debugging capabilities to help enhance productivity when working with partitioned data.

609 questions
0
votes
1 answer

How to do duplicate file check in DataStage?

For instance File A Loaded then next day File B Loaded then next day This time Again, File A received this time sequence should be abort Can anyone help me out with this Thanks
Krishna
  • 25
  • 4
0
votes
1 answer

How to duplicate records check between the source file and target table in Datastage

I want to do two types of duplicate checking If we already have loaded A file With That name previously. For instance, file A is loaded into the target table, and subsequent run, if we receive the file A, this time sequence should be aborted…
Krishna
  • 25
  • 4
0
votes
1 answer

Schema reconciliation

I have written a case statement as : Case when currency=‘Abc’ then outstanding_bal else null end as outstanding _balances_RO And i my job is failing saying this error messege Source is ODBC connector , connected to Microsoft SQL server. Schema…
Roopa
  • 1
  • 2
0
votes
2 answers

What is the precise definition of "Stage / Staging" in Computer Science / Engineering?

I am starting in Data Science and I come from math/stats/economics. I am very used to precise definitions even if it means going a bit deeper into the theory to explain something as simple as a function. I tried to look for precise definitions of…
0
votes
1 answer

jruleImportException : the selected archive does not have descriptor.xml

I have created JAR file from decision service in IBM ODM. I was using that into datastage application to call rule app from datastage. while doing that I am getting error. JruleImoprtException: the selected archive "filePath" is not valid ruleset…
0
votes
1 answer

Reset a Stage Variable from another Stage Variable

I have a requirement to concatenate multiple lines of data into a single line. The only indentation that multiple lines belong together on a single line is "^" at the end of the last line. I have tried your recommendation and the stage variable to…
tbtcust
  • 65
  • 6
0
votes
1 answer

Concatenate multiple lines of data into a single line

I have a requirement to concatenate multiple lines of data into a single line. The only indexation that multiple lines belong together on a single line is "^" at end of the last line. Please see example below. I have tried solutions in a…
tbtcust
  • 65
  • 6
0
votes
1 answer

Datastage:Split Parameter value

I have a parallel job with parameter value "202203011537". I want to split this parameter value as "20220301" and "1537" and use it in SQL stage. Is there a way we can do it in Datastage parallel job?
0
votes
1 answer

Datastage: Looping with multiple values

I have multiple Date and Store values in an excel, I need to loop the datastage parallel job based on Date and Store. Parallel job has SQL query based on Date and Store so i need to pass these values from Sequence job. I developed a Sequence job…
0
votes
0 answers

Calling db2 function from db connector in IBM Datastage

I would like to create job: SourceFile -> Transformer -> Lookup connected to DB connector (Lookup Sparse) -> Peek SourceFile contains dates (string format). In Transformer I've added InputDate with input dates, SourceT and TargetT with some…
Mr Lewy
  • 11
  • 2
0
votes
1 answer

Parallel job is adding extra columns when outputting to a dataset

The last job before my dataset is written is a transformation. It's a lot more complex than this, but the basics are: input = A Integer, B Integer and C Integer output = A Integer, if B > 10 then C else 0 -> C Integer So, to clarify, column A is…
Richard Hansell
  • 5,315
  • 1
  • 16
  • 35
0
votes
1 answer

Every last itteration failing in start loop stage

I have created a genric sequence job. Exec command >> start loop >> job activity >> end loop Here in exec command stage i have written a script to get list of files present in directory as csv values, and the file count will…
Salva
  • 81
  • 1
  • 9
0
votes
2 answers

Datastage- Loop throught the file to read email ID and send email

I have to read Input file to get email id of employees and send each employee email. How can I do this using Datastage job? File looks like this, PERSON_ID|FName|LName|Email_ID
0
votes
1 answer

I want to remove milliseconds in the timestamp, and I want to convert the string to the timestamp in DataStage

I'm getting data like 2022-01-27 15:04:17.457000000, and I want to remove it after .457000000 I want the data like 2022-01-27 15:04:17 with timestamp datatype in DataStage Data coming from the file Can anyone help with this issue.
Krishna
  • 25
  • 4
0
votes
1 answer

Datastage constraint to filter decimal values

In Datastage, I had a requirement, from a list of values (varchar datatype) (for example 10.25, 8.10, 8.40, etc) I need to evaluate if the number is > 0 but not divisible by 0.5 and need to be sent in tinyint any suggestion? Thanks.