FIND Nth HIGHEST SALARY GROUP BY DEPTNO FROM EMP TABLE IN DATASTAGE

FIND-Nth-HIGHEST-SALARY-GROUP-BYDEPTNO-FROM-EMP-TABLE-DATASTAGE
This question can be answered easily in SQL in the following ways.
Finding the nth highest salary in Oracle using rownum:
select * from (
select Emp.*,
row_number() over (partition by deptno order by Salary DESC) rownumb
from Employee Emp
)
where rownumb = n;  /*n is nth highest salary*/
Find the nth highest salary in Oracle using RANK():

Oracle also provides a RANK function that just assigns a ranking numeric value (with 1 being the highest) for some sorted values. So, we can use this SQL in Oracle to find the nth highest salary using the RANK function:

select * FROM (
select EmployeeID, Salary
,rank() over (partition by deptno order by Salary DESC) ranking
from Employee
)
WHERE ranking = N;

Anyway there is a difference by using ROW_NUMBER(),RANK() and DENSE_RANK(). You can see the difference in this post.
How can we achieve this in DataStage?
Let say we have following records in a file and I would like to calculate 3rd highest salary in each department.
Input Data:

Empid,Ename,Deptno,Salary
123,Naveen,10,8000
124,Raj,20,4000
125,Rob,30,3000
126,Jack,10,5000
127,Ronald,10,6000
128,Venky,20,5000
129,Bob,20,7000

Expected output:

Empid,Ename,Deptno,Salary
123,Naveen,10,8000
129,Bob,20,7000

Solution is simple here, we can do this by using only one transformer stage like the design below:

find-nth-highest-salary-from-emp-table

Now we are going to use below properties in transformer stage.

  • Constraints
  • Stage Variables
  • Derivations
  • On link sort

In input partitioning tab, sort the data by making ‘Deptno’ and ‘Salary’ as key columns and Run the stage in ‘Sequential’ mode.
In stage variables window define following 2 stage variables.

Stage Variable Initial Value Derivation
svSalRank 0 If link.Deptno=svDeptBuffer Then 1 Else svSalRank+1
svDeptBuffer 0 link.Deptno

Please note that stage variables order is also important here.

Assign all input columns in derivations by including one more column Rank, this must be derived from svSalRank stage variable.
After doing this, my records will become like this.

Empid,Ename,Deptno,Salary,Rank
126,Jack,10,5000,1
127,Ronald,10,6000,2
123,Naveen,10,8000,3
124,Raj,20,4000,1
128,Venky,20,5000,2
129,Bob,20,7000,3
125,Rob,30,3000,1

Our aim is to pass the only 3rd max highest salary records, so have a constraint like ‘svSalRank=3’ i.e. Only 3rd max highest salary records will be passed to output link.

Here output records are:
Empid,Ename,Deptno,Salary
123,Naveen,10,8000
129,Bob,20,7000

If you want to make this as reusable and find nth highest salary, just use the parameter in constraint like

”svSalRank=pHighSalary’

Here pHighSalary is a job parameter that can be changed at run time.

Comments

comments

Leave a Reply