Runtime Column Propagation (RCP) in DataStage

What is RCP in DataStage?

InfoSphere DataStage is also flexible about meta data. It can cope with the situation where meta data is not fully defined.

You can define part of your schema and specify that, if your job encounters extra columns that are not defined in the meta data when it actually runs, it will adopt these extra columns and propagate them through the rest of the job.

This is known as Runtime Column Propagation (RCP).

RCP can be enabled for a project via the Administrator client, and set for individual links via the Output Page Columns tab for most stages, or in the Output page General tab for Transformer stages.

runtime-column-propagation-enable-admin

runtime-column-propagation-enable-stage

You should always ensure that runtime column propagation is turned on if you want to use schema files to define column meta data.

When we run the Datastage Jobs, the columns may change from one stage to another stage. At that point of time we will be loading the unnecessary columns in to the stage, which is not required.

If we want to load the required columns to load into the target, we can do this by enabling a RCP.

If we enable RCP, we can send the required columns into the target.

RCP is mostly useful when we use reusable job where different metadata comes into the picture.

Using RCP With Sequential Stages

Runtime column propagation (RCP) allows DataStage to be flexible about the columns you define in a job.

If RCP is enabled for a project, you can just define the columns you are interested in using in a job, but ask DataStage to propagate the other columns through the various stages.

So such columns can be extracted from the data source and end up on your data target without explicitly being operated on in between.

Read also: When should we use Sparse Lookup or Join in DataStage?

Sequential files, unlike most other data sources, do not have inherent column definitions, and so DataStage cannot always tell where there are extra columns that need propagating.

You can only use RCP on sequential files if you have used the Schema File property to specify a schema which describes all the columns in the sequential file.

You need to specify the same schema file for any similar stages in the job where you want to propagate columns. Stages that will require a schema file are:

  • Sequential File
  • File Set
  • External Source
  • External Target
  • Column Import
  • Column Export

Comments

comments

Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.

%d bloggers like this: