DataStage Scenario Based Question:
Candidates are filling forms in some xyz institute, they are submitting photocopy of their documents, but the documents they submit are not all the same, some may submit 1 document some may submit 3 and so on.
You have the list of all the documents that have possibility of submission by the candidate.
Input Data:
List of Documents(Docs)
Docs |
---|
a |
b |
c |
d |
e |
List of Candidates and their Documents (CandidateID, Document)
CandidateID | Document |
---|---|
1 | c |
1 | e |
2 | a |
2 | d |
2 | e |
Solution:
STEP1: Read the input files from sequential file stages.
STEP2: Use column generator stage on both sides to generate new column, this column will be used to join the data from two input files.
As there are no common key columns in two files, we are going with this approach.
STEP3: Remove duplicates based on Candidate ID key column.
STEP4: Join the two inputs based on the DUMMY column that was generated from column generator stages.
STEP5: Use lookup stage and join using CandidateID and Document.
Rejected data from Lookup stage is our final required data.
STEP6: Rejected data from lookup stage is:
Normal output from lookup stage is:
This would be the final design of the job.