DIFFERENCE BETWEEN LINK SORT AND SORT STAGE IN DATASTAGE

DIFFERENCE-BETWEEN-LINK-SORT-AND-SORT-STAGE-IN-DATASTAGE

In several processing stages, you can choose or set the sort criteria (i.e. an link or “in-stage” sort).  When you do that, one of those tiny “Sort” icons show up the metadata link.

in-link-sort-options-datastage

So, you know that the data is being sorted between stages.  So you may be wondering, why there is a separate Sort stage even offered, and why would I ever need to use it?

This is one of the most asked DataStage interview questions.  

Let me answer this question in an elaborate way.

Read also: DataStage certification 000-421 sample questions with answers

If you would like me to answer in one line, I can say “Sort stage provides more options than link sort”.

So what are those options that we get to Sort stage more than link (in-stage) sort?

For link sort, you can’t control how much memory is allocated, I think the default is 20MB; in the Sort stage, you can specify how much memory to use.

“Link Sort” uses scratch disk (physical location on disk), whereas “Sort Stage” uses server RAM (Memory). Hence we can change the default memory size in “Sort Stage”.

sort-stage-buffer-memory-selection-datastage

The Sort stage will tell OSH that the stream was previously sorted on a column(s), and also to not sort on that column(s) but to sort on an additional column(s) e.g. the stream is already sorted on Columns A and B (but not C), so you can specify that the key to sort on is A, B and C, but A and B were previously sorted, thus, only sort on column C.

sort-stage-options-datastage

In different words, if your job is having performance issues and you’ve narrowed the problem to sorting, many of the problems can be addressed by a separate Sort stage.

If the volume of the data is low, then we go for link sort. If the volume of the data is high, then we go for sort stage.

Read also: How to capture unmatched records from join stage

While we’re on the subject, you know the difference between an explicit and implicit Sort, don’t you?  The explicit sort scenarios are listed above.  

For an implicit sort: over the years, DataStage has gotten smart enough to insert a sort into OSH when it wasn’t specified in the code.  

For instance, you want to aggregate on column A, but the job didn’t specify to sort the data on column A before the Aggregator, so, DataStage will implicitly include/insert a sort in your OSH.

Comments

comments

Leave a Reply