1. What is a data warehouse?
A data warehouse is the electronic storage of an organization’s historical data for the purpose of data analytics. In other words, a data warehouse contains a wide variety of data that supports the decision-making process in an organization.
Datawarehouse consists of wide variety of data that has high level of business conditions at a single point in time.
2. What is a Dimension Table?
Dimension table is a table which has attributes of measurements stored in fact tables. This table consists of hierarchies, categories and logic that can be used to traverse in nodes.
3. What is a Fact Table?
Fact table contains the measurement of business processes, and it contains foreign keys for the dimension tables.
Example – If the business process is selling of products
Average number of products sold by one agent – measure of the business process.
Also read: Informatica interview questions and answers
4. What is a junk dimension?
In few scenarios where certain data may not be appropriate to store in the fact table, this data (or attributes) can be stored in a junk dimension. The nature of data of junk dimension is usually Boolean or flag values. This will redude the dimensions of the facts drastically.
For example, we can store the Genders in separate table and use the key of Gender table in Fact table.
5. What is a conformed dimension?
conformed dimension is a dimension that has the same meaning to every fact with which it relates. Conformed dimensions allow facts and measures to be categorized and described in the same way across multiple facts and/or data marts, ensuring consistent reporting across the enterprise.
A conformed dimension can exist as a single dimension table that relates to multiple fact tables within the same data warehouse.
For example, Date is a common conformed dimension because its attributes (day, week, month, quarter, year, etc.) have the same meaning when joined to any fact table.
6. What is Star Schema?
In Star schema, there is Fact table as a center and all dimension tables surrounded with that fact table. It is called as Star schema because diagram resembles a star with points radiating from center.
Star schema is designed in such way that it will optimize the querying on large data sets. In Star schema multiple dimension tables joined with only one fact table in denormalized form.
7.What is snowflakes schema?
Snowflake schema is a form of dimensional modeling where dimensions are stored with multiple dimension tables. Snowflake schema is variation over star schema. The schema is diagrammed as each fact is surrounded with dimensions and some dimensions are further related to other dimensions which are branched in snowflake pattern.
In snowflake schema multiple dimension tables are organized and joined with fact table.Only difference between star and snowflake schema is dimensions are normalized in snowflake schema. Normalization splits up data in to additional tables.
8. What do you mean by Granularity?
Granularity in table represents the level of information stored in the table. In data warehouse systems, granularity is very important concept to check the table data.
The granularity can be high and low . Low granularity means data has low level information only. In Fact tables, we always have low granularity means we need very low level data in fact table.
For example, Date Dimension Granularity level is ‘Year,month,quarter,period,week,day’.
Also read: DataStage interview questions and answers
9. What is a Slowly Changing Dimensions (SCD) and explain the types of SCD?
Dimensions that change slowly over time, rather than changing on regular schedule are called Slowly Changing Dimensions.
In Data Warehouse there is a need to track changes in dimension attributes in order to report historical data. In other words, implementing one of the SCD types should enable users assigning proper dimension’s attribute value for given date.
Example of such dimensions could be: customer, geography, employee.
There are many approaches how to deal with SCD. The most popular are:
Type 1 – Overwriting the old value
Type 2 – Creating a new additional record
Type 3 – Adding a new column
Type 1 –
Overwriting the old value. In this method no history of dimension changes is kept in the database. The old dimension value is simply overwritten by the new one. This is easy to maintain and easy to implement.
For example,
Before the change:
EmpID,Ename,City
112,Naveen,Hyderabad
After the change:
EmpID,Ename,City
112,Naveen,Mumbai
In this example, we simply updated the City of employee.
Type 2 –
Adding another extra record. In this technique all history of measurement changes is kept in the database. You ccapture change by including another column with another surrogate key to the dimension table.
Both the prevouos and new rows contain as attributes the natural key. Also ‘start date’,’end date’ and ‘current indicator'(we can use version also) columns are used in this method.
There could be only one record with current indicator set to ‘Y’. For ‘start date’ columns, recieved time of new record i.e. current date , the end_date for current record usually is set to value 12-31-9999 or NULL.
Introducing changes to the dimensional model in type 2 could be very expensive database operation so it is not recommended to use it in dimensions where a new attribute could be added in the future.
For example,
Before the change:
Surrogate_Key,EmpID,Ename,City,start_date,end_date,current_indicator
123,112,Naveen,Hyderabad,08-21-2018,12-31-9999,Y
After the change:
Surrogate_Key,EmpID,Ename,City,start_date,end_date,current_indicator
123,112,Naveen,Hyderabad,08-21-2018,10-15-2019,N
124,112,Naveen,Mumbai,10-15-2019,12-31-9999,Y
Type 3 –
Adding a new column to store recent history attribue value. In this type usually only the current and previous value of dimension is kept in the database. The new value is loaded into ‘current’ column and the old one into ‘previous’ column.
Before the change:
EmpID,Ename,City,Previous_City
112,Naveen,Hyderabad,NULL
After the change:
EmpID,Ename,City,Previous_City
112,Naveen,Hyderabad,Mumbai