#### MCIS 2019 Data Mining and Data warehousing Questions

- Many Data Scientists agree on “We are drowning in Data and starved for Information.” what makes the ‘Data Mining’ as one of probable tools so solve aforesaid problem. Illustrate with appropriate example.
- Consider the below piece of DataMart with very limited information as the sales of a retail chain store, and answer the following:

(a) List all the cuboids that you have from the above DataMart.

(b) In how many different ways that you can run rollup operation on this DataMart? illustrate with tables.

(c) They are planning for optimizing the overall chain operation with closing some stores and extending hours of operations on efficient ones. Provide your recommendations with justification statement.

(d) Consider (P1, P2, P3) products generate (10%, 20%, 30%) profits on their sales, now prepare the product order priority list considering our sales data.

3. Why data transformation is to be carried out before actual model building and/or processing? Explain any three different data transformation methods with appropriate example and sample calculations.

4. The sample of particular disease related disorder on some patients are listed as below. There are three tests (T1, T2 and T3) and final classification as whether having disorder or positive or not having disorder as negative . Answer the followings:

(a) Calculate the entropy of the whole sample with respect to positive disorder class.

(b) Which one among T1 and T2, has the more information gain? show your calculation.

(c) Is it possible to calculate the possible split point for t3 score variable too? if yes, show the calculation and find out the best split.

5. Here is the sample transaction data records of famous café of town , ABC (American Bakery & Café) on a particular morning of a day. Assume, the café transaction record considers the each single bill record only, which might consists of either group of customers purchase of single customer purchase. Considering minimum support 40% and minimum confidence of 50% , answer the following:

(a) using apriori algorithm for frequent pattern mining, identify the list of frequent items as per the s and c thresholds. ( you need to show all the steps of calculations).

(b) Find out all strong association rules of ABC transaction (i.e. x^y→Z).

6. Compare the following with illustrate examples:

(a) Clustering based on partitioning vs clustering based on hierarchy.

(b) Anomaly detection using statistical methods vs distance based methods.

7. Most of the learning methods has to deal with both construction and pruning phases. Are these phases complimentary or competitive ones? Explain retaining with any two methods and appropriate examples.

8. What are the components and associated analysis techniques of the following :

(a) time-series based data

(b) web mining on web related data

9. Explain the complete learning principles using ANN. Is ANN limited to supervised learning only or it also does support unsupervised learning? justify.

10. Write short notes on the following :

a) DMQL

b) K-fold cross validation

c) Fuzzy logic based learning

d) Gini Index

e) Mining on multimedia data