Data Mining Worksheet One

The document discusses various data mining concepts and techniques including: 1. Calculating mean, median, mode, midrange, and quartiles of age data. 2. Computing Euclidean and Manhattan distances between data tuples. 3. Methods for handling missing data values which commonly occur in real-world data. 4. Data smoothing and normalization techniques including bin means, min-max, and z-score normalization applied to age data. 5. Distinguishing outliers from noises in data.

Uploaded by

Abrham Danail

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

158 views2 pages

Data Mining Worksheet One

Uploaded by

Abrham Danail

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 2

Data Mining Worksheet One

1. Suppose that the data for analysis includes the attribute age. The age values for the data
tuples are (in increasing order) 13, 15, 16, 16, 19, 20, 20, 21, 22, 22, 25, 25, 25, 25, 30, 33,
33, 35, 35, 35, 35, 36, 40, 45, 46, 52, 70.
a. What is the mean of the data? What is the median?
b. What is the mode of the data? Comment on the data’s modality (i.e., bimodal,
trimodal, etc.).
c. What is the midrange of the data?
d. Can you find (roughly) the first quartile (Q1) and the third quartile (Q3) of the data?
e. Give the five-number summary of the data.
2. Given two objects represented by the tuples (22, 1, 42, 10) and (20, 0, 36, 8):
a. Compute the Euclidean distance between the two objects.
b. Compute the Manhattan distance between the two objects.
3. In real-world data, tuples with missing values for some attributes are a common occurrence.
Describe various methods for handling this problem.
4. Suppose that the data for analysis includes the attribute age. The age values for the data
tuples are 25, 13, 33, 15, 16, 19, 20, 20, 21, 22, 25, 30, 33, 25, 35, 35, 25, 36, 40, 35, 45, 35,
46, 52, 70, 22, 16
a. Use smoothing by bin means to smooth the above data, using a bin depth of 3.
Illustrate your steps.
b. Using equal width of size 3.
c. Use min-max normalization to transform the value 35 for age onto the range [0.0,
1.0].
d. Use z-score normalization to transform the value 35 for age, where the standard
deviation of age is 12.94 years
5. Use the methods below to normalize the following group of data: 200, 300, 400, 600, 1000
a. min-max normalization by setting min = 0 and max = 1
b. z-score normalization
6. How do you differentiate outliers from noises?
7. Write down the difference between operational database systems and data warehouses
8. Briefly compare the following concepts. You may use an example to explain your point(s).
a. Snowflake schema, fact constellation, star schema models
b. Data cleaning, data transformation, refresh
9. Suppose that a data warehouse consists of the three dimensions time, doctor, and patient,
and the two measures count and charge, where charge is the fee that a doctor charges a
patient for a visit.
a. Enumerate three classes of schemas that are popularly used for modeling data
warehouses.
b. Draw a schema diagram for the above data warehouse using one of the schema
classes listed in (a).
c. Starting with the base cuboid [day, doctor, patient], what specific OLAP operations
should be performed in order to list the total fee collected by each doctor in 2010?
d. To obtain the same list, write an SQL query assuming the data is stored in a
relational database with the schema fee (day, month, year, doctor, hospital, patient,
count, charge).

Introduction To SQL Test Your Understanding
100% (1)
Introduction To SQL Test Your Understanding
71 pages
2 ASSIGNMENT 2 (Beginning Superstore)
0% (1)
2 ASSIGNMENT 2 (Beginning Superstore)
1 page
Ibm Spss Statistics Project
No ratings yet
Ibm Spss Statistics Project
14 pages
E-Tivity 2.2 Tharcisse 217010849
No ratings yet
E-Tivity 2.2 Tharcisse 217010849
7 pages
An Overview of Business Intelligence, Analytics, and Data Science
0% (1)
An Overview of Business Intelligence, Analytics, and Data Science
44 pages
Instructions For Use: Diagnostic Audiometer AD 226
No ratings yet
Instructions For Use: Diagnostic Audiometer AD 226
172 pages
Kenya: IEBC Polling Stations Without 3G
100% (1)
Kenya: IEBC Polling Stations Without 3G
161 pages
Manufacturing Process Audit Report: Customer Part No. Part Name Line/Area/ Porcess
100% (1)
Manufacturing Process Audit Report: Customer Part No. Part Name Line/Area/ Porcess
1 page
Web Application Development Dos and Donts
No ratings yet
Web Application Development Dos and Donts
18 pages
Assignment 02
No ratings yet
Assignment 02
9 pages
IS328 Data Mining-Tutorial 1 Solution
No ratings yet
IS328 Data Mining-Tutorial 1 Solution
5 pages
Data Mining Cluster
50% (2)
Data Mining Cluster
4 pages
Q.1. Why Is Data Preprocessing Required?
100% (1)
Q.1. Why Is Data Preprocessing Required?
26 pages
Cluster Analysis Chapter 8 Solution
No ratings yet
Cluster Analysis Chapter 8 Solution
8 pages
Data Mining
No ratings yet
Data Mining
15 pages
Sawtooth Software: Analysis of Traditional Conjoint Using Microsoft Excel: An Introductory Example
No ratings yet
Sawtooth Software: Analysis of Traditional Conjoint Using Microsoft Excel: An Introductory Example
7 pages
(MCQ) - Data Warehouse and Data Mining - LMT
No ratings yet
(MCQ) - Data Warehouse and Data Mining - LMT
4 pages
Answers To Problems For Data Mining and Predictive Analytics (2nd Edition) by Larose
No ratings yet
Answers To Problems For Data Mining and Predictive Analytics (2nd Edition) by Larose
12 pages
Clustering & PCA Assignment Questions
No ratings yet
Clustering & PCA Assignment Questions
4 pages
Data Mining Exam
No ratings yet
Data Mining Exam
14 pages
Data Mining and Warehousing
No ratings yet
Data Mining and Warehousing
12 pages
12-Exploratory Data Analysis, Anomaly Detection-28!03!2023
No ratings yet
12-Exploratory Data Analysis, Anomaly Detection-28!03!2023
79 pages
DM Important Questions
100% (1)
DM Important Questions
2 pages
Data Preprocessing: L1+ Freq
No ratings yet
Data Preprocessing: L1+ Freq
13 pages
Chapter 06 Normalization of Database Tables
No ratings yet
Chapter 06 Normalization of Database Tables
26 pages
DataMining Lecture 1
No ratings yet
DataMining Lecture 1
35 pages
Attribute Oriented Induction
100% (1)
Attribute Oriented Induction
6 pages
DATA MINING Chapter 1 and 2 Lect Slide
No ratings yet
DATA MINING Chapter 1 and 2 Lect Slide
47 pages
CH 6
No ratings yet
CH 6
72 pages
Business Statistics: A Decision-Making Approach: Graphs, Charts, and Tables - Describing Your Data
No ratings yet
Business Statistics: A Decision-Making Approach: Graphs, Charts, and Tables - Describing Your Data
47 pages
Advanced Statistics Project
No ratings yet
Advanced Statistics Project
12 pages
Data Mining MCQ Links
No ratings yet
Data Mining MCQ Links
1 page
Quiz M2
100% (1)
Quiz M2
7 pages
Data Mining: Concepts and Techniques: - Slides For Textbook - Chapter 6
No ratings yet
Data Mining: Concepts and Techniques: - Slides For Textbook - Chapter 6
82 pages
Assignment-Based Subjective Questions/Answers
No ratings yet
Assignment-Based Subjective Questions/Answers
3 pages
Lecture 3 Data Mining
No ratings yet
Lecture 3 Data Mining
30 pages
Association Rules FP Growth
No ratings yet
Association Rules FP Growth
32 pages
Big Data Unit5
No ratings yet
Big Data Unit5
57 pages
The Database Approach To Data Management
67% (6)
The Database Approach To Data Management
50 pages
Sharda Dss10 PPT 08 ST
No ratings yet
Sharda Dss10 PPT 08 ST
14 pages
Data Literacy Questions All Types
No ratings yet
Data Literacy Questions All Types
2 pages
SQL Quiz
No ratings yet
SQL Quiz
4 pages
Classification Error: Training Errors Generalization Errors
No ratings yet
Classification Error: Training Errors Generalization Errors
39 pages
Graded Quiz - Using Probability Distributions - Coursera
No ratings yet
Graded Quiz - Using Probability Distributions - Coursera
10 pages
Data Warehousing MCQ
No ratings yet
Data Warehousing MCQ
71 pages
Market Basket Analysis and Advanced Data Mining: Professor Amit Basu
No ratings yet
Market Basket Analysis and Advanced Data Mining: Professor Amit Basu
24 pages
Question Bank: Data Warehousing and Data Mining Semester: VII
No ratings yet
Question Bank: Data Warehousing and Data Mining Semester: VII
4 pages
RMM Unit-I Introdution To Data Mining
No ratings yet
RMM Unit-I Introdution To Data Mining
129 pages
Data Science Questions and Answers - Letsfindcourse
100% (1)
Data Science Questions and Answers - Letsfindcourse
5 pages
SE 7204 BIG Data Analysis Unit I Final
No ratings yet
SE 7204 BIG Data Analysis Unit I Final
66 pages
SOFT COMPUTING _NOTES_UNIT 4 and UNIT 5
No ratings yet
SOFT COMPUTING _NOTES_UNIT 4 and UNIT 5
32 pages
Introduction To Data Mining
No ratings yet
Introduction To Data Mining
19 pages
Chap 1 Data Preprocessing
No ratings yet
Chap 1 Data Preprocessing
17 pages
Data Mining
No ratings yet
Data Mining
27 pages
Data Mining and Model Selection
No ratings yet
Data Mining and Model Selection
27 pages
Outline: Problem Statement Definitions & Examples Strategies
No ratings yet
Outline: Problem Statement Definitions & Examples Strategies
7 pages
Module 1 Quiz
No ratings yet
Module 1 Quiz
7 pages
Mining Frequent Itemset-Association Analysis
No ratings yet
Mining Frequent Itemset-Association Analysis
59 pages
Rayleigh Model
No ratings yet
Rayleigh Model
9 pages
Optimizing Hadoop for MapReduce
From Everand
Optimizing Hadoop for MapReduce
Khaled Tannir
No ratings yet
21CS63 - Unit1 Practice Questions
No ratings yet
21CS63 - Unit1 Practice Questions
3 pages
1 Assignment
No ratings yet
1 Assignment
2 pages
Assignment 2
No ratings yet
Assignment 2
2 pages
ML Assignment-1
No ratings yet
ML Assignment-1
7 pages
New York Silicon Alley Weekly Newsletter 10-February-2012
No ratings yet
New York Silicon Alley Weekly Newsletter 10-February-2012
7 pages
Atollic CortexM Crash Analysis Whitepaper
No ratings yet
Atollic CortexM Crash Analysis Whitepaper
14 pages
Exercise 8.1 - IOT AWS
No ratings yet
Exercise 8.1 - IOT AWS
15 pages
5G Smart Diabetes Toward Personalized Diabetes Diagnosis With Healthcare Big Data Clouds-PAPER
No ratings yet
5G Smart Diabetes Toward Personalized Diabetes Diagnosis With Healthcare Big Data Clouds-PAPER
11 pages
Jacob - (WIP) : Buildsim
No ratings yet
Jacob - (WIP) : Buildsim
10 pages
Developing SCRABBLE Game
No ratings yet
Developing SCRABBLE Game
73 pages
Pce Merz Engl
No ratings yet
Pce Merz Engl
8 pages
Allow Experts To Take Care of Your Back-Office Operations So You Can Focus On What You Do Best
No ratings yet
Allow Experts To Take Care of Your Back-Office Operations So You Can Focus On What You Do Best
7 pages
isms.ignou.ac.in_changeadmdata_AdmissionStatusNew.ASP
No ratings yet
isms.ignou.ac.in_changeadmdata_AdmissionStatusNew.ASP
2 pages
4th LCC ICTO test
No ratings yet
4th LCC ICTO test
15 pages
Effective SharePoint Governance Guide
No ratings yet
Effective SharePoint Governance Guide
27 pages
Manufacturing Solutions Brochure WW en PDF
No ratings yet
Manufacturing Solutions Brochure WW en PDF
7 pages
D1417 Schematic
No ratings yet
D1417 Schematic
3 pages
ASUG82475 - Removing Barriers To Customer-Centric Service With SAP Service Cloud
No ratings yet
ASUG82475 - Removing Barriers To Customer-Centric Service With SAP Service Cloud
17 pages
2023 DSE English Paper 2 Question
No ratings yet
2023 DSE English Paper 2 Question
4 pages
Defining Error Codes LG
100% (1)
Defining Error Codes LG
2 pages
How to Configure SSH in Packet Tracer - SYSNETTECH Solutions
No ratings yet
How to Configure SSH in Packet Tracer - SYSNETTECH Solutions
6 pages
200sec Questions
No ratings yet
200sec Questions
166 pages
Visual Arts SS 1
No ratings yet
Visual Arts SS 1
12 pages
Manual Casio Casiotone LK-S245 (2 Páginas)
No ratings yet
Manual Casio Casiotone LK-S245 (2 Páginas)
3 pages
Variable Editor Output
No ratings yet
Variable Editor Output
1 page
Test Report: Digital Grid Smart Infrastructure Division
No ratings yet
Test Report: Digital Grid Smart Infrastructure Division
12 pages
Week 11 Notes
No ratings yet
Week 11 Notes
12 pages
Magic Quadrant For Oracle Ap 340232 Gartner Oracle Application Services PDF
No ratings yet
Magic Quadrant For Oracle Ap 340232 Gartner Oracle Application Services PDF
38 pages
Mining Better Technical Trading Strategies With Genetic Algorithms (2006)
No ratings yet
Mining Better Technical Trading Strategies With Genetic Algorithms (2006)
8 pages
AIML_III Sem_DS_LAB_MANUAL_BCSL305_2024-25
No ratings yet
AIML_III Sem_DS_LAB_MANUAL_BCSL305_2024-25
56 pages

Data Mining Worksheet One

Uploaded by

Data Mining Worksheet One

Uploaded by

Data Mining Worksheet One

You might also like