Data mining is a major areas of exploration for knowledge discovery in databases. This topic has gained great relevance especially in the 1990’s and early 2000’s with web data growing at an exponential rate. As more data are collected by businesses and scientific institutions alike,knowledge exploration techniques are needed to gain useful business intelligence. This course will cover a wide spectrum of industry standard techniques using widely available database and tools packages for knowledge discovery.
Data mining is for relatively unstructured data for which more sophisticated techniques are needed. The course aims to cover powerful data mining techniques including clustering, association rules, and classification. We also introduce high volume data processing mechanisms by building warehouse schemas such as snowflake and star. OLAP query retrieval techniques are also introduced. We learn the basics of Data Warehousing structure and query formulation, as it impacts a data miner. We do not query against an actual data warehouse. There are other courses for that, if you are interested in Data Warehousing.
COMP 251: Introduction to Database Systems or COMP 271: Data Structures
Please note: although statistics and database design are not absolutely required as prerequisites, they are important areas of knowledge for many data mining concepts. Therefore, we have a "crash course" in these areas,basic statistics, and several references to relevant database design topics.Title: Data Mining:
Concepts and Techniques,
Third Edition
Authors: Jiawei Han, Micheline Kamber,
Jian Pei
Publisher: Morgan Kaufmann; 3rd
edition (July 6, 2011)
ISBN-10: 0123814790 or try this
Title: Data Mining for the Masses, 2ed, with Implementations
in RapidMiner and R
Author: Matthew North
ISBN-13:
978-1523321438
You can buy the hard copy on Amazon, or free older version here. There is also a support site for the second edition, including all datasets as well as very comprehensive PPT slides. This may be sufficient for the additional R topics that are not covered in the first (free) edition.
Nice reference
book, if you want to read more about data warehousing:
Title: The Data Warehouse Toolkit, third edition
Authors: Ralph Kimball and Margy Ross
Publisher: Wiley
ISBN: 978-1-118-53080-1
Course
Objectives and Goals
After
taking
this course, students should be able to:
93 - 100 | A |
90 - 92 | A- |
87 -89 | B+ |
83 - 86 | B |
80-82 | B- |
77 - 79 | C+ |
73-76 | C |
70-72 | C- |
67-69 | D+ |
60 - 66 | D |
59 and lower | F |
Meeting | Week | Assignment Type | Assignment Name | Points | Due Date |
17-Jan | Orientation | Orientation | Video Tour | 5 | 24-Jan |
Orientation | Syllabus | 5 | 24-Jan | ||
Orientation | Greetings Forum | 5 | 24-Jan | ||
Orientation | Using Zoom | 0 | 24-Jan | ||
Orientation | Install RM | 5 | 24-Jan | ||
Orientation | Install R-Studio | 5 | 24-Jan | ||
Orientation | Join Piazza | 0 | 24-Jan | ||
17-Jan | Week 1 | Lab (RM) | Install RM Repositories | 10 | 24-Jan |
24-Jan | Week 2 | Lab (RM) | RM Getting Started | 15 | 31-Jan |
Lab (R ) | Intro R | 0 | 31-Jan | ||
Homework | Chapter 2 | 15 | 31-Jan | ||
31-Jan | Week 3 | Homework | Chapter 3 | 15 | 7-Feb |
7-Feb | Week 4 | Lab (RM and R) | DMM: Data
Prep Correlation |
15 | 8-Feb |
Lab (RM) | Visualization,
Discretization Correlation |
15 | 14-Feb | ||
Homework | Chapter 4 | 15 | 14-Feb | ||
14-Feb | Week 5 | Lab (RM and R) | DMM:
Assoc Rules FP Growth |
20 | 21-Feb |
Lab (RM) | 202_Single-Rule | 21-Feb | |||
Homework | Chapter 6 | 20 | 21-Feb | ||
21-Feb | Week 6 | Lab: Text Mining | FP/Clustering | 20 | 14-Mar |
Lab: Text Mining | Zipf/Mandelbrot | 35 | 14-Mar | ||
Lab: Text Mining | Web crawling/Word Clouds | 35 | 14-Mar | ||
28-Feb | Week 7 | Exam | Midterm | 200 | 28-Feb |
Project Lab | Explore Datasets | 15 | 21-Mar | ||
14-Mar | Week 8 | Lab (RM) | Classification Models | 20 | 18-Mar |
Homework | Chapter 8 | 20 | 28-Mar | ||
Project | Project Proposal | 15 | 28-Mar | ||
21-Mar | Week 9 | Lab (RM) | KNN, NN, CTS using NN | 15 | 25-Mar |
Homework | Chapter 9 | 15 | 11-Apr | ||
28-Mar | Week 10 | Lab (RM) | Affinity Marketing | 40 | 3-Apr |
4-Apr | Week 11 | Project | Progress Report | 5 | 4-Apr |
Homework | Chapter 10 | 10 | |||
11-Apr | Week 12 | Lab (RM and R) | DMM: K-Means, Clustering | 10 | 15-Apr |
Project | Project Freeze | 0 | 11-Apr | ||
18-Apr | Week 13 | ||||
25-Apr | Week 14 | Project | Models | 100 | 25-Apr |
Project | Presentation | 25 | 25-Apr | ||
Project | Report | 25 | 25-Apr | ||
Project | Excellence | 30 | 25-Apr | ||
2-May | Week 15 | Exam | Final Exam | 200 | 2-May |
TOTAL POINTS | 1000 |
Course
Schedule
This
schedule is a guide.
Exact dates and topics may be subject to change. It
is my best
estimate, but we may have to adjust the schedule slightly.
You are responsible for
all
announcement/changes made in class or posted on Sakai.
Date |
Topic |
Text/Files/Links |
Due |
Before class begins | Orientation Module | see Sakai Orientation module!! | preferably before class starts |
1/17 |
Intro to
Course |
|
all Orientation assignments |
Lab
(RM): Install Repositories |
all links and videos on Sakai | ||
1/24 |
Data
Visualization |
|
|
1/31 | Data
Visualization; Data Reduction;Attribute Reduction; Discretization; Missing Values Crash Course in Stats, Part 3 (Hypothesis Testing) |
Han: Chapter 3 | HW,
Chapter 2 (all HW assignments in the same file) Lab (RM): Getting Started Lab(R): Intro R |
2/07 |
Data Warehousing Lab
(RM and R):
Visualization, Discretization, Corrleation |
Chapter 4 |
|
2/14 |
Frequent
Patterns Lab:
FP and Association Rules (including DMM with RM and R, and
also
an additional lab named "202_SingleRule", which in not in DMM)
|
DMM,
Ch. 5 (Association Rules, FP_Growth) |
HW, Ch 4 |
2/21 |
Three Text
Mining Labs: (see Sakai for instructions, videos, and
process downloads): These are much more serious labs than in earlier weeks. You will love them!!
|
All files and links posted on Sakai | HW
(Ch 6) Lab: Associtaion Rules, FP Growth and "202_SingleRule" Lab |
2/28 |
Midterm Exam continue with Text Mining LabsLab: Exploring Datasets, preliminary exploration |
|
|
3/14 |
Project
Discussion, time
permitting |
Chapter 8 |
|
3/21 | Classification,
continued (KNN, Neural Networks) Lab: Neural Networks, possible Medical lab Lab: Explore Datasets and Project Finalization (time permitting) |
Chapter 9 (KNN and Neural Networks) | Lab: Explore Datasets |
3/28 |
Lab:
Affinity Marketing using RapidMiner |
|
Project
Proposal |
4/04 | Project Team Meetings--finalize project requirements | Affinity
Marketing Lab
(DUE 4/03!!) Project Progress Report (DUE before class!) |
|
4/11 |
Clustering |
Project
Freeze Lab:
K-Means, Clustering DUE 4/15!! |
|
4/18 | Work on Projects, Questions, Team Meetings | ||
4/25 |
Project Presentations |
|
Project Model Project Report/PPTs Project Presentation Team Participation Index |
5/02 |
Final Exam |
see
Sakai. Final Exam will involve answering questions
and
interpreting results on specific datasets. May include a
hands-on
component. |
Regular SPRING Semesters | |
2018 | |
Spring Semester Open registration ends at midnight | Jan. 14 |
Martin Luther King, Jr., Holiday, No classes | Jan. 15 |
Spring Semester Begins. Late and Change of Registration begins - Late registration fee applies | Jan. 16 (Tues) |
Late and change registration ends. Last day to withdraw without a mark of "W" | Jan. 22 |
Last day to drop class(es) with a Bursar credit of 100%- dates maintained by Bursar | Jan. 29 (Mon) |
Last day to convert from credit to audit or vice versa - Last day to request or cancel pass/no pass option | Jan. 29 |
Last day to drop class(es) with a Bursar credit of 50%- dates maintained by Bursar | Feb. 12 (Mon) |
Summer Registration Begins | Feb. 12 |
Ash Wed (46 days before Easter): Classes meet, Special worship services available | Feb. 14 |
Last day to drop class(es) with a Bursar credit of 20% (zero credit thereafter) | Feb. 19 (Mon) |
Last day for students to submit assignments to change an "I" grade, from the preceding Fall Semester and the preceding "J" Term, to a letter grade; Faculty may set an earlier deadline | Feb. 26 |
Early alert process begins on Mon week 7 and runs through Fri of the week 9 | Feb. 26 |
Last day to file applications with Deans' offices for degrees awarded in December for this year | Mar. 1 |
Spring Break: No classes | Mar. 5 - 10 |
Classes Resume | Mar. 12 |
Last day (5:00 p.m.) to withdraw with a grade of "W", after this date, the penalty grade of "WF" is assigned | Mar. 26 |
Good Friday , No classes (offices closed) | Mar. 30 |
Easter Holiday: No classes Thurs evening (classes that start 4:15 p.m. or later are canceled) through Mon afternoon (classes beginning on or after 4:15 p.m. will be held) | Mar. 29 - Apr. 2 |
Fall Semester UGRD Registration begins | 16-Apr |
Spring Semester classes end | 27-Apr |
Final Examinations | April 30 - May 5 |
*Study Day Wednesdays: No daytime exams will be held. | |
Evening classes meeting at 4:15pm or later will hold exams as scheduled. |