- What is the difference between Join/Lookup/Merge stages? How these will react if duplicates records come in input links?
- In join stage, if one input have col1,col2,col3 and other have col4,col5,col6 then how to join this and perform left outer join ?
- When we use Lookup Stage?
- Can we use Hash Partition for reference link in Lookup stage?
- How many types of joins supported by Merge/Join/Lookup stages?
- Which Partition methods should we use in Merge/Lookup/Join, explain why?
- Describe the Partition methods in DataStage?
Answer: There are total 9 partition methods.
Auto: DataStage attempts to work out the best partitioning method depending on execution modes of current and preceding stages and how many nodes are specified in the Configuration file. This is the default partitioning method for most stages.
DB2: Replicates the DB2 partitioning method of a specific DB2 table. Requires extra properties to be set. Access these properties by clicking the properties button.
Entire: Each file written to receives the entire data set.
Hash: The records are hashed into partitions based on the value of a key column or columns selected from the Available list.
Modulus: The records are partitioned using a modulus function on the key column selected from the Available list. This is commonly used to partition on tag fields.
Random: The records are partitioned randomly, based on the output of a random number generator.
Round Robin: The records are partitioned on a round robin basis as they enter the stage.
Same: Preserves the partitioning already in place.
Range: Divides a data set into approximately equal size partitions based on one or more partitioning keys. Range partitioning is often a pre processing step to performing a total sort on a data set. Requires extra properties to be set. Access these properties by clicking the properties button.
- Which partition method is faster, Hash or Modulus?
Answer: Hash and Modulus techniques are Key based partition techniques. If all the the key columns are numeric data type then we use Modulus partition technique. If one or more key columns are text then we use Hash partition technique. Modulus is fast comparing with Hash because we are using only numeric fields for calculation purpose and Mod function calculation will be done faster than Hash function.
- Job design/requirement is forced me to use only join stage, in this case how can I capture reject records?
Answer: The Join stage does not provide reject handling for unmatched records (such as in an Inner Join scenario). If un-matched rows must be captured or logged, an OUTER join operation must be performed.
In an OUTER join scenario, all rows on an outer link (eg. Left Outer, Right Outer, or both links in the case of Full Outer) are output regardless of match on key values. During an Outer Join, when a match does not occur, the Join stage inserts NULL values into the unmatched columns. Care must be taken to change the column properties to allow NULL values before the Join.
This is most easily done by inserting a Copy stage and mapping a column from NON-NULLABLE to NULLABLE. However if the column is NON-NULLABLE, it will default value for that unmatched rows like for CHAR it will pad the spaces till that length and for VARCHAR it will assign Blank(”) and for INTEGER it will assign zero.
A Filter stage can be used to test for NULL values in unmatched columns. In some cases, it is simpler to use a Column Generator to add an ‘indicator’ column, with a constant value, to each of the outer links and test that column for the constant after you have performed the join.
This is also handy with Lookups that have multiple reference links.
- How many rejects links I can give in Merge stage?
Answer: Either none or number of reject links equal to input update links. Let say If you have 3 input update links and one master input link, you should give exact 3 reject links for each update link or none.
- How to remove duplicates in a table without using inner query?
- What is a “degenerate dimension”?
- Student table has columns like Sub1,Sub2,Sub3,Sub4 etc, how to get total marks of student based on his name?
- What is the difference between WHERE & HAVING clause?
- What is the order of execution of SQL query?
- What is Change Data Capture (CDC) stage ? how it works, explain it with simple job design
- When we go for CDC stage or why we use CDC stage?
Answer: One benefit of the CDC is we can put change capture into a shared container for use across multiple jobs. This code re-use can save a lot of time. The container has a transformer in it with two input columns: keyfields and valuefields, and two output links: inserts and updates. The keyfields column contains the key fields concatenated into a string with a delimiter such as |. The value columns contains all fields concatenated with a delimiter. The job that uses the container needs to concatenate the input columns and pass them to the container and then split the output insert and update rows. Row Merge and Row Splitter can be used to do this.
- Difference between Funnel/Filter stages?
- Input Number is :141675 , sort this to 114567 ?
- Explain about SCD (Slowly Changing Dimension) and it types like SCD1, SCD2 & SCD3?
Next Article : DataStage Interview Questions-2