DATASTAGE TRANSFORMER USAGE GUIDELINES

Choosing Proper Stages

The parallel Transformer stage always generates “C” language code which will compile to a parallel component.

Because of this, it is mandatory to minimize the number of transformers in a single job and to use other stages (Filter, Copy, Modify, Switch.., etc.) when derivations are not needed.

  • The Copy stage should be used instead of a Transformer for simple operations including:
  • Job Design placeholder between stages (unless the Force option =true, DataStage engine will optimize this out at runtime)
  • Renaming Columns
  • Dropping Columns
  • Default Type Conversions

Note that rename, drop (if runtime column propagation (RCP) is disabled), and default type conversion can also be performed by the output mapping tab of any stage.

Consider, if possible, implementing complex derivation expressions using regular patterns by Lookup tables instead of using a Transformer with nested derivations.

For example, the derivation expression:

IF A=0,1,2,3 THEN B=”X” IF A=4,5,6,7 THEN B=”C”

Could be implemented with a lookup table containing values for column A and corresponding values of column B.

Optimize the overall job flow design by combining derivations from multiple Transformers into a single Transformer stage whenever possible.

The Filter and/or Switch stages can be used to separate rows into multiple output links based on SQL-like link constraint expressions.

The Modify stage can be used for non-default type conversions, null handling, and character string trimming.

Buildops should be used instead of Transformers in the handful of scenarios where complex reusable logic is required, or where existing Transformer-based job flows do not meet performance requirements.

Transformer NULL Handling and Reject Link

When evaluating expressions for output derivations or link constraints, the Transformer will reject (through the reject link indicated by a dashed line) any row that has a NULL value used in the expression.

To create a Transformer reject link in DataStage Designer, right-click on an output link and choose “Convert to Reject”.

The Transformer rejects NULL derivation results because the rules for arithmetic and string handling of NULL values are by definition undefined.

For this reason, always test for null values before using a column in an expression, for example:

If IsNull(link.col) Then… Else…

Note that if an incoming column is only used in a pass-through derivation, the Transformer will allow this row to be output.

Transformer Derivation Evaluation

Output derivations are evaluated BEFORE any type conversions on the assignment. For example, the PadString function uses the length of the source type, not the target.

Therefore, it is important to make sure the type conversion is done before a row reaches the Transformer.

transformer-stage-in-datastage-convert-reject

For example, TrimLeadingTrailing(String) works only if String is a VarChar field. Thus, the incoming column must be typed VarChar before it is evaluated in the Transformer.

Conditionally Aborting Jobs

The Transformer can be used to conditionally abort a job when incoming data matches a specific rule. Create a new output link that will handle rows that match the abort rule.

Within the link constraints dialogue box, apply the abort rule to this output link and set the “Abort After Rows” count to the number of rows allowed before the job should aborted.

transformer-stage-in-datastage-abort-after-rows

Since the Transformer will abort the entire job flow immediately, it is possible that valid rows will not have been flushed from Sequential File (export) buffers, or committed to database tables. It is important to set the Sequential File buffer flush or database commit parameters.

Comments

comments

Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.

%d bloggers like this: