When should we use Sparse Lookup or Join in DataStage?

You have been working on DataStage and most of the times you have used Lookup stage for lookup data from reference link.

Suddenly you heard about Sparse lookup and now you are thinking like what is the difference between normal lookup and sparse lookup?

That’s simple. When you use a database connector stage (Only database stages) as a reference link for lookup stage, you see the Sparse lookup option in connector stage like below.

In Oracle connector stage:

sparse-lookup-oracle-database-datastage

In DB2 connector stage:

sparse-lookup-db2-database-datastage

Here the real question is, when to use Sparse lookup and Normal lookup?

Data read by any database stage can serve as the reference input to a Lookup stage. By default, this reference data from database is being loaded into memory like any other reference link that used in lookup stage.

When directly connected as the reference link to a Lookup stage, both DB2/UDB Enterprise and Oracle Enterprise stages allow the lookup type to be changed to Sparse and send individual SQL statements to the reference database for each incoming Lookup row.

Remember, sparse lookup is only available when the database stage is directly connected to the reference link, with no intermediate stages between lookup and database stage.

It is important to note that the individual SQL statements required by a Sparse Lookup are an expensive operation from a performance perspective.

In most cases, it is faster to use an InfoSphere DataStage Join stage between the input and DB2 reference data than it is to perform a Sparse Lookup.

But where we should use Sparse Lookup?

For scenarios where the number of input rows is significantly smaller (1:100 or more) than the number of reference rows in a DB2 or Oracle table, a Sparse Lookup might be appropriate.

Credits: Part of the information in this post has been taken from DataStage Parallel Job Advanced Developer’s Guide (IBM property).

Comments

comments

Leave a Reply