Capture duplicate records in DataStage – Job Design – 1

Input file:
Sno,name,salary,rank
1,Naveen,10000,2
2,Joe,20000,2
3,David,30000,1
4,Richard,30000,2
5,Obs,30000,1
1,Naveen,10000,2
1,Naveen,10000,2
3,David,30000,1

Output file:
1,Naveen,10000,2
1,Naveen,10000,2
3,David,30000,1

Job design would be:

First sequential file stage used to read input file and duplicate records are highlighted in sequential file stage:

In Sort stage, we sorts the records based on key column ‘sno’ and enable the option ‘Create Key Change Column’ as ‘True’.
After enabling this option, one column with the name ‘KeyChange’ will be created in output.
For the first occurrence of a particular value, it assigns 1 and for subsequent occurances of the same value it assigns 0.

Now you can use a filter stage with condition like KeyChange=0 or KeyChange < 1 to capture duplicate records.

Ran the job here:

3 records have been filtered in filter stage as their KeyChange value is ‘0’. So the final output will be:

You can watch the video tutorial of this post here:

Comments

comments

Leave a Reply