As more data are collected by businesses and scientific institutions, knowledge exploration techniques are needed to gain useful business intelligence. This course covers the theory and practice of the analysis (mining) of extremely large datasets.
Data mining is for relatively unstructured data for which more sophisticated techniques are needed. The course aims to cover powerful data mining techniques including clustering, association rules, and classification. We also breifly introduce high volume data processing mechanisms by modeling warehouse schemas such as snowflake and star. OLAP query retrieval techniques are also introduced. We learn the basics of Data Warehousing structure and query formulation, as it impacts a data miner. We do not query against an actual data warehouse. There are other courses for that, if you are interested in Data Warehousing.
Outcome
Students will be able to define, apply and critically analyze data mining approaches for fields such as security, health care, science, marketing and text analysis.
COMP 251: Introduction to Database Systems or COMP 271: Data Structures
Please note: although statistics and database design are not absolutely required as prerequisites, they are important areas of knowledge for many data mining concepts. Therefore, we have a "crash course" in basic statistics, and several references to relevant database design topics.Required:
For
general reference (language and platform independent); for homework
problems, lectures and examples:
This book is useful for reference and conceptual examples (language independent, with nice illustrations) of some of the underlying concepts/algorithms (such as apriori, basic classification and clustering algorithms and more). The book doesn't have a more recent edition, but it is something of a classic as a data mining text. Since you can find this quite inexpensively on the internet, I am including it here for reference.
Title: Data
Mining: Concepts and Techniques, Third Edition
Authors: Jiawei Han, Micheline Kamber, Jian Pei
Publisher: Morgan Kaufmann; 3rd edition (2012)
ISBN-10: 0123814790 or try this
Required: For the RapidMiner and R lab part of the course:\
Data Mining for the Masses
****There is an excellent fourth edition out that is an online and updated version of the third edition. It has update powerpoint slides, short explanatory videos, review questions and other support materials. Some students have really liked this version of the lab text, so I have created a course link where you can purchase this online text.
Support
site for the third edition
Reference:
or
implementations
in R (assignments, labs, cases, etc.), for reference and some
good examples:
Title: Data Mining For Business and Analytics
Concepts, Techniques and Applications in R.
Authors: Shmueli, Bruce, Yahav, Patel, Lichtendahl
Publisher: Morgan Kaufmann; 3rd edition (July 6, 2011)
ISBN-10: 1118879368
ISBN-13:
978- 1118879368
For ggplot
examples: (You don't have to buy the book. It is based off
of his website. Illustrated examples, if you are interested in Data
Visualization for your project presentation.) (Or just take the
DataViz class.)
Alboukadel Kassambara. Guide to
Create Beautiful Graphics in R, STHDA, 2013. isbn:
9781532916960. Most examples, with small modifications, are
available on his wonderful website and
his R
support website.
Course
Objectives
and Goals
After
taking
this
course, students should be able to:
Students with Disabilities:
Loyola University Chicago provides reasonable accommodations for students with disabilities. Any student requesting accommodations related to a disability or other condition is required to register with the Student Accessibility Center (SAC). Professors will receive an accommodation notification from SAC, preferably within the first two weeks of class. Students are encouraged to meet with their professor individually in order to discuss their accommodations. All information will remain confidential. Please note that in this class, software may be used to audio record class lectures in order to provide equal access to students with disabilities. Students approved for this accommodation use recordings for their personal study only and recordings may not be shared with other people or used in any way against the faculty member, other lecturers, or students whose classroom comments are recorded as part of the class activity. Recordings are deleted at the end of the semester. For more information about registering with SAC or questions about accommodations, please contact SAC at 773-508-3700 or SAC@luc.edu.
Students who are allowed to take their exams in the SAC office are encouraged to do so. Should you choose to take the exam in the classroom, I cannot guarantee that the classroom environment will be quiet enough to provide you with the environment that your disability may require. If you choose to take the exam in the classroom, you are taking that risk.
Online
Recording
Policy
In this class software
may be used to record live class discussions. As a student in this
class, your participation in live class discussions will be recorded.
These recordings will be made available only to students enrolled in the
class, to assist those who cannot attend the live session or to serve as
a resource for those who would like to review content that was
presented. All recordings will become unavailable to students in the
class when the Sakai course is unpublished (i.e. shortly after the
course ends, per the Sakai administrative schedule:
https://www.luc.edu/itrs/sakai/sakaiadministrativeschedule/). Students
who prefer to participate via audio only will be allowed to disable
their video camera so only audio will be captured. Please discuss this
option with your professor. The
use of all video recordings will be in keeping with the University
Privacy Statement shown below:
Assuring privacy among
faculty and students engaged in online and face-to-face
instructional activities helps promote open and robust
conversations and mitigates concerns that comments made within the
context of the class will be shared beyond the classroom. As such,
recordings of instructional activities occurring in online or
face-to-face classes may be used solely for internal class purposes by
the faculty member and students registered for the course, and only
during the period in which the course is offered. Students will be
informed of such recordings by a statement in the syllabus for the
course in which they will be recorded. Instructors who wish to make
subsequent use of recordings that include student activity may do
so only with informed written consent of the students
involved or if all student activity is removed from the recording.
Recordings including student activity that have been initiated by the
instructor may be retained by the instructor only for individual
use.
93 - 100 | A |
90 - 92 | A- |
87 -89 | B+ |
83 - 86 | B |
80-82 | B- |
77 - 79 | C+ |
73-76 | C |
70-72 | C- |
67-69 | D+ |
60 - 66 | D |
59 and lower | F |
Week Beginning | Week | Assignment Type | Assignment Name | Points | Due Date |
before semester | Orientation | Orientation | Video Tour and Syllabus |
10 |
25-Jan |
Orientation | Greetings Forum | 5 | 25-Jan | ||
Orientation | Using Zoom | 0 | 25-Jan | ||
Orientation | Install RM | 5 | 25-Jan | ||
Orientation | Install R-Studio | 5 | 25-Jan | ||
19-Jan | Week 1 | Lab (RM) | Install RM Repositories | 10 | 27-Jan |
25-Jan | Week 2 | Lab (RM) | RM Getting Started | 15 | 31-Jan |
Lab (R ) | Intro R | 10 | 31-Jan | ||
Homework | Chapter 2 | 15 | 31-Jan | ||
1-Feb | Week 3 | Homework | Chapter 3 | 15 | 4-Feb |
8-Feb | Week 4 | Lab (RM and R) | DMM-Ch3: Data
Prep DMM-Ch4: Correlation |
15 | 11-Feb |
Lab (RM) (links for R info) |
Visualization, Discretization 3 ways |
15 | 14-Feb | ||
Homework | Chapter 4 | 15 | 14-Feb | ||
15-Feb | Week 5 | Lab (RM and R) | DMM-Ch 5 (RM):
Assoc -FP |
10 | 18-Feb |
DMM-Ch 5
(R): Assoc Rules |
10 |
18-Feb |
|||
Lab (RM) | 202_Single-Rule | 10 |
18-Feb | ||
Homework | Chapter 6 | 20 | 24-Feb | ||
22-Feb |
Week 6 |
Project |
Exploring Datasets,
prelim. |
20 |
25-Feb |
1-Mar | Week 7 |
Lab: Text Mining | FP/Clustering | 20 | 14-Mar |
Lab: Text Mining | Zipf/Mandelbrot | 35 | 14-Mar | ||
Lab: Text Mining | Web crawling/Word Clouds | 35 | 14-Mar | ||
Project Lab |
Explore Datasets,
continued |
15 |
04-Mar |
||
15-Mar | Week 8 | Lab (RM) | Classification
Models: Decision Trees Bayes, CrossValidation, ROC/LIFT |
30 | 20-Mar depends on Midterm |
Homework | Chapter 8 | 20 | 20-Mar depends on Midterm |
||
22-Mar | Week 9 | Midterm Exam | 250 | TBA | |
29-Mar |
Week 10 |
Project Zoom Meetings |
Project Proposal Zoom Meetings |
15 |
1-Apr |
Lab (RM) | KNN, NN, CTS using NN | 25 | 5-Apr | ||
Homework | Chapter 9 | 15 | 5-Apr | ||
5-Apr | Week 11 | Lab (RM) | Affinity Marketing | 50 | 13-Apr |
12-Apr | Week 12 | Project | Progress Report | 5 | 14-Apr |
Project |
Project Freeze |
0 |
15-Apr |
||
19-Apr | Week 13 | Homework | Chapter 10 | 10 | 03-May |
Lab (RM and R) | DMM: K-Means, Clustering | 10 | 22-Apr | ||
3-May | Week 15 |
Project | Models
(+interpretation) |
125 | 05-May |
Project | Presentation | 50 |
05-May | ||
Project | Report | 25 | 05-May | ||
Project | Excellence | 50 | 05-May | ||
Participation,
Prompt
Submissions, meeting attendance, etc. |
10 |
||||
TOTAL POINTS | 1000 |
Course
Schedule
This
schedule
is a guide. Exact dates and topics may be subject to change.
It is my best estimate, but we may have to adjust the schedule
slightly. You are responsible for all
announcement/changes made in class or posted on Sakai.
Week |
Week Beginning |
Topic |
Text/Files/Links |
Due |
|
Before Class Begins | Orientation Module | see Sakai Orientation module!! |
|
1 |
1/19 |
Chapter
1: Intro to Course |
|
|
Lab (RM): Install Repositories |
||||
2 |
1/25 |
Chapter
2: Data Visualization and Similarity Measures
Crash Course in Stats, Part 2 (Probability Distributions)
|
|
|
|
|
|||
|
Getting started RM website
follow-along
files
|
|||
3 | 2/01 |
|
|
|
4 |
2/08 |
|
|
|
|
|
|
||
5 |
2/15 |
|
|
|
|
|
|||
6 | 2/22 |
|
|
|
7 |
3/01 |
Midterm
Review Three Text Mining Labs: (see Sakai for instructions, videos, and process downloads): These are much more serious labs than in earlier weeks. You will love them!! Do NOT wait until the last minute to work on them.
Optional Zoom meetings re: team datasets!! |
|
|
8 |
3/15 |
|
|
|
9 | 3/22 | Midterm Exam (withdraw deadline is still TBA on the Academic Calendar) |
|
|
10 | 3/29 |
|
|
|
11 |
4/05 |
|
|
|
12 | 4/12 |
|
|
|
13 |
4/19 |
|
|
|
14 |
4/26 |
Work
on Projects, Questions |
|
|
15 |
5/03 |
Project Presentations, video or zoom |
|
Academic Calendar: Graduate