Questions tagged [datastage]

DataStage is the ETL (Extract, Transform, Load) component of the IBM InfoSphere Information Server suite. It allows the user to integrate various data sources and targets in an enterprise environment as a GUI based client tool.

DataStage is the ETL (Extract, Transform, Load) component of the IBM InfoSphere Information Server suite. It allows the user to integrate various data sources and targets in an enterprise environment as a GUI based client tool. Data Sources/Targets could be database tables, flat files, datasets, csv files etc. Basic design paradigm consists of a unit of work called as DataStage job. Multiple jobs can be controlled and conditionally sequenced using 'Sequences'.

IBM® InfoSphere® DataStage® integrates data across multiple systems using a high performance parallel framework, and it supports extended metadata management and enterprise connectivity. The scalable platform provides more flexible integration of all types of data, including big data at rest (Hadoop-based) or in motion (stream-based), on distributed and mainframe platforms.

Read more here

InfoSphere DataStage provides these features and benefits:

  • Powerful, scalable ETL platform
  • Support for big data and Hadoop
  • Near real-time data integration
  • Workload and business rules management
  • Ease of use

Support for big data and Hadoop

  • Includes support for IBM InfoSphere BigInsights, Cloudera, Apache and Hortonworks Hadoop Distributed File System (HDFS).
  • Offers Balanced Optimization for Hadoop capabilities to push processing to the data and improve efficiency.
  • Supports big-data governance including features such as impact analysis and data lineage

Powerful, scalable ETL platform

  • Manages data arriving in near real-time as well as data received on a periodic or scheduled basis.

  • Provides high-performance processing of very large data volumes.

  • Leverages the parallel processing capabilities of multiprocessor hardware platforms to help you manage growing data volumes and shrinking batch windows.

  • Supports heterogeneous data sources and targets in a single job including text files, XML, ERP systems, most databases (including partitioned databases), web services, and business intelligence tools.

Near real-time data integration

  • Captures messages from Message Oriented Middleware (MOM) queues using Java Message Services (JMS) or WebSphere MQ adapters, allowing you to combine data into conforming operational and historical analysis perspectives.

  • Provides a service-oriented architecture (SOA) for publishing data integration logic as shared services that can be reused over the enterprise.

  • Can simultaneously support high-speed, high reliability requirements of transactional processing and the large volume bulk data requirements of batch processing.

Ease of use

  • Includes an operations console and interactive debugger for parallel jobs to help you enhance productivity and accelerate problem resolution.

  • Helps reduce the development and maintenance cycle for data integration projects by simplifying administration and maximizing development resources.

  • Offers operational intelligence capabilities, smart management of metadata and metadata imports, and parallel debugging capabilities to help enhance productivity when working with partitioned data.

609 questions
0
votes
1 answer

Issue with running a DB2 Script from DataStage Job

I have a DB2 script as below - BEGIN DECLARE var1 INTEGER; DECLARE var2 INTEGER; SET var1=<>; SET var2=<>; BEGIN WHILE (var1 <= var2) DO DELETE FROM (SELECT * FROM table_name WHERE ID >= var1 FETCH…
0
votes
1 answer

Identify matched rules

There are two DB2 tables: Rule Table and Input Table, which are visually represented in the attached image. The Rule Table has three rules - RL001, RL002, and RL003. A rule is considered a match if all conditions within the same RLID are mapped. Our…
william007
  • 17,375
  • 25
  • 118
  • 194
0
votes
0 answers

Kafka Connector continous pull data

How can we configure the Kafka connector in IBM DataStage to continuously pick up new records at a rate of 28 records per second and trigger the data pipeline?
william007
  • 17,375
  • 25
  • 118
  • 194
0
votes
1 answer

Datastage server - OS Upgradation

I am using DS 11.7 installed on OS Windows Server 2012 R2 Standard, We need to upgrade the OS to Windows Server 2016 Standard Edition. Can Anyone help me with Steps to be followed for this migration. I Have prepared general checklist like starting…
0
votes
2 answers

APT_BadAlloc from Join Stage in Data Stage

There is a ETL job dealing with over 43000000 rows and it often fails because of APT_BadAlloc when it process a JOIN stage. Here is the log. Join_Stage,0: terminate called after throwing an instance of 'APT_BadAlloc' Issuing abort after 1 warnings…
llearner
  • 37
  • 5
0
votes
1 answer

I am trying to use a DataStage hierarchical data object to retrieve an OAuth2 token, is this possible?

I am trying to retrieve an OAuth2 token from our Azure cloud via the DS hierarchical data object. I am trying to do this without variables and just plugging in the actual URL which would include the tenant_id (as part of the URL). For the rest of…
0
votes
1 answer

PostgreSQL Table data auto purging through Datastage Job

We are having a Postgres table which is filling very drastically (almost 5GB/day). We wanted to purge the Table for every 2 months old record. We want to implement a Datastage job for auto purging the table. Kindly suggest the possible ways to…
0
votes
1 answer

Copy Stage on IBM Data Stage

I found a strange thing while using Copy Data to insert data into a table. All columns are processed in a transformer and there are two special columns in the transformer. Column A uses index function to perform LIKE operation in a string. Column B…
llearner
  • 37
  • 5
0
votes
1 answer

Unix - linux x86_64 - command to get/contact all filenames & first line in a particular path

I am needing a Unix script to get and concat all the filenames in a path/folder with the first line in each file. Example there is a location /bin/etc/target and there are 3 files in it file1.csv header,2 file2.csv header,3 file3.csv header,4 My…
sunny
  • 3
  • 2
0
votes
1 answer

Is there a way to run the DataStage jobs based on different dates?

I have a table containing the dates for the ETL jobs to be run. I do know that using the schedule function in DataStage director able to schedule the jobs run on a specific date or recurring weekly/monthly. However, in my case, the date will…
xxx
  • 13
  • 3
0
votes
0 answers

Using ML.NET I would like to detect anomalies in DataStage ETL job runs

I would like to detect the anomalies of the ETL job runs which processes the same amount of rows (either produced or consumed) but it takes longer to do it (based on the ElapsedRunSecs property). The input schema looks like this: public class…
Mihaimyh
  • 1,262
  • 1
  • 13
  • 35
0
votes
0 answers

Flat File Staging For Format Conversion:

I have a code the converts data from one format to another by inserting the data manually. I need assistance on how to stage the flat/txt file and apply the same code to come up with same results. DBFiddle https://dbfiddle.uk/cwfvntlt -- DDL and…
0
votes
1 answer

How to convert chinese text to english text in snowflake

I want to convert chinese text to english text in snowflake I am having sets of chinese text wanted to convert in english in snowflake
0
votes
0 answers

How to restore the truncated data after decimal point for existing records using SQL or Datastage?

I want to restore the truncated data after the decimal point due to the insufficient length of the column it is truncated now I wish to restore truncated data and remove the decimal point also. We have an untruncated column in a different table…
Krishna
  • 25
  • 4
0
votes
1 answer

How to get last day of previous month in DataStage?

I explored all the functions available in the transformer, but I couldn't find the exact function to get the last day of the previous month in standard format, i.e. dd/mm/yyyy. Please help me in this regard. The field that needs to appear is in the…
Enzo Niro
  • 1
  • 1