The Datastage configuration file is a master management file (a text file which sits on the server side) for jobs which describes the parallel system resources and design.

The configuration file provides hardware configuration for supporting such architectures as SMP (Single machine with multiple CPU , shared memory and disk), Grid , Cluster or MPP (multiple C.P.U, multiple nodes and dedicated memory per node). DataStage understands the architecture of the system through this file.

This is one among the biggest strengths of Datastage. For cases in which you have modified your processing configurations, or changed servers or platform, you will never have to worry about it affecting your jobs since  all the jobs depend on this configuration file for execution.

Datastage jobs determine which node to run the process on, where to store the temporary data, where to store the dataset data, based on the entries provide in the configuration file. There is a default configuration file obtainable whenever the server is installed.

The configuration files have extension “.apt”.

The most outcome from having the configuration file is to separate software and hardware configuration from job design. It permits changing hardware and software resources without changing a job design.

Datastage jobs can point to different configuration files by using job parameters, which means that a job can utilize different hardware architectures without being recompiled.

The configuration file contains the various processing nodes and also specifies the disk space provided for each processing node which are logical processing nodes that are specified in the configuration file. Thus if you have more than one CPU this does not mean the nodes in your configuration file correspond to these CPUs.

It is possible to have more than one logical node on a single physical node. However you ought to be wise in configuring the number of logical nodes on a single physical node. Increasing nodes, increases the degree of parallelism but it does not necessarily mean better performance because it results in more number of processes.

If your underlying system should have the capability to handle these loads then you will be having a very inefficient configuration on your hands.

$APT_CONFIG_FILE is the environment variable using which DataStage determines the configuration file (one can have many configuration files for a project) to be used.



Leave a Reply