DataStage Interview Questions-2

21. Did you implement any SCD type in your jobs? If yes, please explain it with job design.

22. How can we implement SCD Type-2 in DataStage, Explain it in different ways?

23. What is Datamart?

Answer:    Datamart is a simple form of a data warehouse that is focused on a single subject (or functional area), such as Sales, Finance, or Marketing. Data marts are often built and controlled by a single department within an organization. Given their single-subject focus, data marts usually draw data from only a few sources. The sources could be internal operational systems, a central data warehouse, or external data.

24. How many types of Datamarts we have? Explain them.

Answer:    There are two basic types of data marts: dependent and independent. The categorization is based primarily on the data source that feeds the data mart.
Dependent data marts draw data from a central data warehouse that has already been created. Independent data marts, in contrast, are standalone systems built by drawing data directly from operational or external sources of data, or both. The main difference between independent and dependent data marts is how you populate the data mart; that is, how you get data out of the sources and into the data mart. This step, called the Extraction-Transformation-and Loading (ETL) process, involves moving data from operational systems, filtering it, and loading it into the data mart. With dependent data marts, this process is somewhat simplified because formatted and summarized (clean) data has already been loaded into the central data warehouse.
The ETL process for dependent data marts is mostly a process of identifying the right subset of data relevant to the chosen data mart subject and moving a copy of it, perhaps in a summarized form. With independent data marts, however, you must deal with all aspects of the ETL process, much as you do with a central data warehouse. The number of sources is likely to be fewer and the amount of data associated with the data mart is less than the warehouse, given your focus on a single subject.
The motivations behind the creation of these two types of data marts are also typically different. Dependent data marts are usually built to achieve improved performance and availability, better control, and lower telecommunication costs resulting from local access of data relevant to a specific department. The creation of independent data marts is often driven by the need to have a solution within a shorter time.

25. Can we join 2 data marts? If yes, How?

Answer:   We can join 2 data marts using conformed dimensions. A conformed dimension is a dimension that has the same meaning to every fact with which it relates. Conformed dimensions allow facts and measures to be categorized and described in the same way across multiple facts and/or data marts, ensuring consistent reporting across the enterprise. The date is a common conformed dimension because its attributes (day, week, month, quarter, year, etc.) have the same meaning when joined to any fact table. Link for more details:

26. What are star schema and snowflake schema?

27. When can we go for the snowflake schema?

28. What are the differences between star schema and snowflake schema?

29. What is Reconciliation? How can we do Reconciliation in Datastage, explain one simple scenario on how to design a job for Reconciliation?

30. What is APT_CONFIG?

31. What is the significance of resource disk and scratch disk within the APT configuration file.

32. Why do we need a configuration file? what it contains?

33. If a data set Dataset1 is created using a config file config1.apt and used in Job J1; and if a job J2 used a config file config2.apt, can Dataset1 be used in Job J2?

34. Can we change the configuration file at run time?

35. Does server jobs need a configuration file?

36. Why the join stage requires hashed and sorted input?

37. Is it mandatory to use the sort stage before every input link to join the stage?

38. Which is the default method in the Aggregator stage (i.e. Hash or sort) and which one should be used in which scenario?

39. What is Sparse lookup, why it is good to use Sparse lookup?

40. Is it good to use a hash partition to remove the duplicate stage?

You can practice the above questions and along with a few more by watching the below video:

Previous :  DataStage Interview Questions-1        Next :  DataStage Interview Questions-3



Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.

%d bloggers like this: