COMP 300-001/002
COMP 488-301/302
Spring, 2018

Course Information
Comp 300-001 (6261) and Comp 488-301 (6288) face-to-face sections
Comp 300-002 (6262) and Comp 488-302 (6289) online sections

Wednesday, 4:15 - 6:45 (for the f2f sections)
School of COMM, Room 013

Catalog Description

Data mining is a major areas of exploration for knowledge discovery in databases. This topic has gained great relevance especially in the 1990’s and early 2000’s with web data growing at an exponential rate. As more data are collected by businesses and scientific institutions alike,knowledge exploration techniques are needed to gain useful business intelligence. This course will cover a wide spectrum of industry standard techniques using widely available database and tools packages for knowledge discovery.

Data mining is for relatively unstructured data for which more sophisticated techniques are needed. The course aims to cover powerful data mining techniques including clustering, association rules, and classification. We also introduce  high volume data processing mechanisms by building warehouse schemas such as snowflake  and star. OLAP query retrieval techniques are also introduced.  We learn the basics of Data Warehousing structure and query formulation, as it impacts a data miner.  We do not query against  an actual  data warehouse.  There are other courses for that, if you are interested in Data Warehousing.


COMP 251: Introduction to Database Systems or COMP 271: Data Structures  

Please note:  although statistics and database design are not absolutely required as prerequisites, they are important areas of knowledge for many data mining concepts.  Therefore, we  have a "crash course" in these areas,basic statistics, and several references to relevant database design topics.

Textbooks (required)

Title: Data Mining: Concepts and Techniques, Third Edition
Authors: Jiawei Han, Micheline Kamber, Jian Pei
Publisher: Morgan Kaufmann; 3rd edition (July 6, 2011)
ISBN-10: 0123814790  or try this

Title:  Data Mining for the Masses, 2ed, with Implementations in RapidMiner and R
Author:  Matthew North
ISBN-13: 978-1523321438

You can buy the hard copy on Amazon, or free older version here.  There is also a support site for the second edition, including all datasets as well as very comprehensive PPT slides.  This may be sufficient for the additional R topics that are not covered in the first (free) edition.

Nice reference book, if you want to read more about data warehousing:
Title:  The Data Warehouse Toolkit, third edition
Authors:  Ralph Kimball and Margy Ross
Publisher: Wiley
ISBN:  978-1-118-53080-1

Dr. Channah F. Naiman
Our classroom is available about 1/2 hour before class.  Also, Room 014 is availabe for about an hour before class.  I am also available after class or by appointment.

TA:  not happening

Course Objectives and Goals

After taking this course, students should be able to:

What this course is NOT:
We will be using the data mining applications package RapidMiner in this couuse.  You may download the current version here. Please check the Orientation Module on Sakai for more information and instructions on installation.  The Community Edition, free, but it has a limitation of 10,000 rows.  If you sign up with your luc email address, you should automatically have an educational license.  This is important, as we have a major lab that requires more than the 10,000 rows, and you may require many rows for your project.  You can check this inside of RM by clicking on Settings-->Manage Licenses.  If it does not show up correctly, then you can request an educational license directly from RM.   Please install RM as soon as possible.  Although I cannot enforce deadlines before the course begins, I do request that you submit a screen shot of your RM installation in the Orientation Module, which is sent out shorlty before classes begin.
Academic Honesty
Students are expected to have read the statement on academic integrity available This policy applies to the course. The minimum penalty for academic dishonesty is a grade of F for that assignment. Multiple instances or a single severe instance on a major exam or assignment may result in a grade of F for the course. All cases of academic dishonesty will be reported to the department office and the relevant college office where they will be placed in your school record.  

Academic dishonesty includes, but is not limited to, working together on assignments that are not group assignments, copying or sharing assignments or exam information with other students except in group assignments, submitting as your own information from current or former students of this course, copying information from anywhere on the web and submitting it as your own work, and submitting anything as your own work which you have not personally created for this course. If you do wish to use materials that are not your own, please check with me ahead of time and cite you source clearly. When in doubt, ask first!

Be aware that I have updated both the midterm and final exams.  I have changed the values for many for the textbook problems that are used for homework problems.  For those problems that require open-ended answers, please br very careful to state the answers in your own words, not in the words of the Instructor's Manual, nor in the words of students who have previously taken this course.

Regarding the project:  Project requirements must be approved of by me, and I may modify the requirements for a specific dataset/team.  Late changes to the project requirements will usually not be allowed and may not be made without permission.  Teams must document participation by posting versions to Github.  A completed project with no record of intermediate versions will not receive credit.

Lateness Policy:
"There's no such thing as an emergency.  There is only poor planning."  While this clearly does not apply to actual (and verifiable) medical and family emergencies, if you wait until the last day before something is due, and then your Internet connection goes down, this does not qualify as an emergency.  Give yourself plenty of time to submit your assignments on time.  If I see that most of the class needs extra time for a specific assignment  (and has been working on it!!) I may be willing to extend the deadline.  But in general, your poor planning or poor time management does not constitute a reason for me to extend the deadline for you.  I am especially careful not to do so as this would be unfair to the other students who turn in their work on time.   We have limited number of sessions, during which time we have 2 exams, a project,  labs (some quite intense), and homework assignments.  Do not  fall behind in your work.  Do not wait until the last minute.  I will not be sympathetic.  You may have heard that I am, in fact, sympathetic.  That is no longer the case.  I have evolved.  Late assignments are worth only half credit.  This is true even if you have a valid reason for submitting the homework late.   Usually, late assginments must be submitted within one week of the due date for half credit. For some assignments, you can't submit it late at all.  And for some, I do not allow an entire week for late submission, but only a few days.  Please check Sakai for exact due dates and the last time for a late submission for a specific assignment.  Further, they can only be submitted late if I have not posted the answers to the homework.   After one week (or the late submission deadline), you will receive zero points for any unsubmitted assignments. No exceptions.

Course Components and Grading
-->Important note about team submissions:  Repeating what was written above under Homework:  If I announce that an assignment may be worked on in a team (for instance, pair programming), each team member must submit something on Sakai.  If you are the team member submitting the assignment, you must also submit a note on Sakai, listing each team member for whom you are submitting the assginment.  If someone else is submitting the assginment, you must submit a note in the Assignment comment box telling me who is submitting the assignment for your team.  Do not assume that just because your team member submitted the assignment that you will automatically get credit.  You will not. You MUST submit a comment letting me know that it was submitted on your behalf.

93 - 100 A
90 - 92  A-
87 -89  B+
83 - 86 B
80-82  B-
77 - 79  C+
73-76 C
70-72  C-
67-69   D+
60 - 66 D
59 and lower F

The table below lists the points value for each component of the course.

Meeting Week Assignment Type Assignment Name Points Due Date
17-Jan Orientation Orientation Video Tour 5 24-Jan
    Orientation Syllabus 5 24-Jan
    Orientation Greetings Forum 5 24-Jan
    Orientation Using Zoom 0 24-Jan
    Orientation Install RM 5 24-Jan
    Orientation Install R-Studio 5 24-Jan
    Orientation Join Piazza 0 24-Jan
17-Jan Week 1 Lab (RM) Install RM Repositories 10 24-Jan
24-Jan Week 2 Lab (RM) RM Getting Started 15 31-Jan
    Lab (R ) Intro R 0 31-Jan
    Homework Chapter 2 15 31-Jan
31-Jan Week 3 Homework Chapter 3 15 7-Feb
7-Feb Week 4 Lab (RM and R) DMM:  Data Prep
15 8-Feb
    Lab (RM) Visualization, Discretization
15 14-Feb
    Homework Chapter 4 15 14-Feb
14-Feb Week 5 Lab (RM and R) DMM: Assoc Rules
FP Growth
20 21-Feb
    Lab (RM) 202_Single-Rule 21-Feb
    Homework Chapter 6 20 21-Feb
21-Feb Week 6 Lab:  Text Mining FP/Clustering 20 14-Mar
    Lab:  Text Mining Zipf/Mandelbrot 35 14-Mar
    Lab:  Text Mining Web crawling/Word Clouds 35 14-Mar
28-Feb Week 7 Exam Midterm 200 28-Feb
    Project Lab Explore Datasets 15 21-Mar
14-Mar Week 8 Lab (RM) Classification Models 20 18-Mar
    Homework Chapter 8 20 28-Mar
    Project   Project Proposal 15 28-Mar
21-Mar Week 9 Lab (RM) KNN, NN, CTS using NN 15 25-Mar
    Homework Chapter 9 15 11-Apr
28-Mar Week 10 Lab (RM) Affinity Marketing 40 3-Apr
4-Apr Week 11 Project Progress Report 5 4-Apr
    Homework Chapter 10 10  
11-Apr Week 12 Lab (RM and R) DMM: K-Means, Clustering 10 15-Apr
    Project Project Freeze 0 11-Apr
18-Apr Week 13        
25-Apr Week 14 Project Models 100 25-Apr
    Project Presentation 25 25-Apr
    Project Report 25 25-Apr

Project Excellence 30 25-Apr
2-May Week 15 Exam Final Exam 200 2-May
                                                        TOTAL POINTS 1000  

Course Schedule

This schedule is a guide.  Exact dates and topics may be subject to change.  It is my best estimate, but we may have to adjust the schedule slightly.  You are responsible for all announcement/changes made in class or posted on Sakai.  





Before class begins Orientation Module see Sakai Orientation module!! preferably before class starts


Intro to Course
Intro to Data Mining
Crash Course in Stats, Part 1 (central tendency, dispersion)
Getting to know your data

  • syllabus
  • Han:  Chapter 1
  • Fasten Your Seatbelts (FYSB), Part 1
    • Slides and Videos on Sakai    
   all Orientation assignments

Lab (RM):  Install Repositories

all links and videos on Sakai


Data Visualization
Similarity Measures
Crash Course in Stats, Part 2 (Probability Distributions)
Lab (RM):  Getting started (off the RM website)
Lab (R):  Intro to R (time permitting)

  • Han:   Chapter 2
  • FYSB slides and videos, Part 2 (on Sakai)
  • links to RM  intro videos, on Sakai
  • R Project files, datasets, videos, on Sakai

Lab (RM):  Install Repositories

1/31 Data Visualization; Data Reduction;Attribute Reduction;
Discretization; Missing Values
Crash Course in Stats, Part 3 (Hypothesis Testing)
Han:  Chapter 3 HW, Chapter 2
(all HW assignments in the same file)
Lab (RM):  Getting Started
Lab(R):  Intro R


Data Warehousing

Lab (RM and R):  Visualization, Discretization, Corrleation

Chapter 4
Data Mining for the Masses
(DMM--RM and R, Ch. 3 and Ch. 4)
DMM files
Additional RM lab files, links on Sakai

HW, Chapter 3


Frequent Patterns
Demo Problem 6.6
Demo p. 257 FP Growth 

Lab:  FP and Association Rules (including DMM with RM and R, and also an additional lab named "202_SingleRule", which in not in DMM)

Chapter 6
Midterm Review

DMM, Ch. 5 (Association Rules, FP_Growth)
Additional RM labs ("202_SingleRule")

HW, Ch 4
Lab: Visualization/Discretization


Three Text Mining Labs: (see Sakai for instructions, videos, and process downloads):
These are much more serious labs than in earlier weeks.  You will love them!!
  • Text Mining using FP and Clustering
  • Text Mining using Zipf-Mandelbrot Distribution
  • Web crawling and Word Clouds

All files and links posted on Sakai HW  (Ch 6)
Lab:  Associtaion Rules, FP Growth and "202_SingleRule" Lab


Midterm Exam

continue with Text Mining Labs
Lab:  Exploring Datasets, preliminary exploration


Project Discussion, time permitting
Lab:  Rules, Decision Trees, KNN, Bayes, CrossValidation,
ROC Charts, Lift Chart

Chapter 8

Lab:  Text Mining

Labs:  Classification, due 3/18 at 10:00 p.m.!!

3/21 Classification, continued (KNN, Neural Networks)
Lab:  Neural Networks, possible Medical lab

Lab:  Explore Datasets and Project Finalization (time permitting)
Chapter 9 (KNN and Neural Networks) Lab:  Explore Datasets


Lab:  Affinity Marketing using RapidMiner
(very complex lab, do NOT wait until the last minute!!)

Lab: KNN,  NN  (Due 3/25)

Project Proposal
HW (Ch8)

4/04 Project Team Meetings--finalize project requirements
Affinity Marketing Lab  (DUE 4/03!!)
Project Progress Report (DUE before class!)


Lab:  K-Means Clustering

Final Exam Review

Chapter 10
DMM, Chapter 6

see Sakai for Final Exam review

Project Freeze

Lab:  K-Means, Clustering DUE 4/15!!

4/18 Work on Projects, Questions, Team Meetings


Project Presentations

Project Model
Project Report/PPTs
Project Presentation
Team Participation Index


Final Exam

see Sakai.  Final Exam will involve answering questions  and interpreting results on specific datasets.  May include a hands-on component.


Regular SPRING Semesters
Spring Semester Open registration ends at midnight Jan. 14
Martin Luther King, Jr., HolidayNo classes Jan. 15
Spring Semester Begins. Late and Change of Registration begins - Late registration fee applies Jan. 16 (Tues)
Late and change registration ends. Last day to withdraw without a mark of "W" Jan. 22
Last day to drop class(es) with a Bursar credit of 100%- dates maintained by Bursar Jan. 29 (Mon)
Last day to convert from credit to audit or vice versa - Last day to request or cancel pass/no pass option Jan. 29
Last day to drop class(es) with a Bursar credit of 50%- dates maintained by Bursar Feb. 12 (Mon)
Summer Registration Begins Feb. 12
Ash Wed (46 days before Easter): Classes meet, Special worship services available Feb. 14
Last day to drop class(es) with a Bursar credit of 20% (zero credit thereafter) Feb. 19 (Mon)
Last day for students to submit assignments to change an "I" grade, from the preceding Fall Semester and the preceding "J" Term, to a letter grade; Faculty may set an earlier deadline Feb. 26
Early alert process begins on Mon week 7 and runs through Fri of the week 9 Feb. 26
Last day to file applications with Deans' offices for degrees awarded in December for this year Mar. 1
Spring Break: No classes Mar. 5 - 10
Classes Resume Mar. 12
Last day (5:00 p.m.) to withdraw with a grade of "W", after this date, the penalty grade of "WF" is assigned Mar. 26
Good Friday , No classes (offices closed) Mar. 30
Easter Holiday: No classes Thurs evening (classes that start 4:15 p.m. or later are canceled) through Mon afternoon (classes beginning on or after 4:15 p.m. will be held) Mar. 29 - Apr. 2
Fall Semester UGRD Registration begins 16-Apr
Spring Semester classes end 27-Apr
Final Examinations April 30 - May 5
*Study Day Wednesdays: No daytime exams will be held.     
Evening classes meeting at 4:15pm or later will hold exams as scheduled.