We need to capture the duplicates from above data using Remove Duplicate stage along with any other stage.
We can capture duplicates using Remove Duplicate Stage, design of the job would be like below.
To test this design, I use below sample data and read that data with sequential file stage. My sample records in sequential stage would be like:
Here I enable ‘Row Number Column’ property in sequential stage in order to generate unique number(this will be used in next stage) for each input record.
In next step, take a copy stage to pass the input into two output links. One link to Remove Duplicate stage side and another link to Change Capture stage side.
In Remove Duplicate stage, we will remove the duplicates based on key column. Here I use “id” as the key column to remove the duplicates. So after this stage we will have only unique records.
Change Capture stage is the important part in this job design. Here Remove Duplicates stage acts as After dataset and Copy stage acts as Before dataset for Change Capture stage.
Select the ‘Unique’ (Row Number Column from sequential stage) column as Key column and Change Mode as ‘Explicit Keys, All Values’. Our aim is to capture only duplicate records, so make ‘Drop Output for Delete’ as ‘False’.
However we can use join stage instead of change capture stage to capture duplicates after Remove Duplicate stage.
Here link ordering also important as that decides Before and After dataset for Change Capture stage.
Compile job and run the job.
You can see the duplicate records in the output.
We hope this article helped to find the solution that you are looking.
If you liked this article, then please subscribe to our YouTube Channel for Wings Of Technology video tutorials. You can also find us on Twitter and Facebook.