The unit explores the various fundamental data mining techniques and their application areas. Supporting techniques like data pre-processing and statistics are also covered.
Objectives
To develop student knowledge of techniques and methods for data mining in large databases, including both those currently being used and those which are presently being researched; for students to become familiar with the currently available techniques for the extraction of knowledge from large databases. At the end of the unit the student should be able to describe the algorithms underlying the most common state-of-the-art data mining tools, and make an informed choice of data mining tool for a given problem. The student should have sufficient understanding to implement at least one fundamental data mining algorithm.
Prerequisites
Basic mathematical and statistical skills and competency in one programming language.
Unit relationships
CSE5230 is an elective] unit in the Masters in Information Technology degree.
Texts and software
Required text(s)
Nil. See a list of relevant texts and papers on the unit website.
Students studying off-campus are required to have the minimum system configuration specified by the Faculty as a condition of accepting admission, and regular Internet access. On-campus students, and those studying at supported study locations may use the facilities available in the computing labs. Information about computer use for students is available from the ITS Student Resource Guide in the Monash University Handbook. You will need to allocate up to n hours per week for use of a computer, including time for newsgroups/discussion groups.
Recommended reading
See the reading list on the unit website.
Library access
You may need to access the Monash library either personally to be able to satisfactorily complete the subject. Be sure to obtain a copy of the Library Guide, and if necessary, the instructions for remote access from the library website.
Introduction to Machine Learning, Data Mining and Statistics;
3
Pre-processing for Data Mining
4
Clustering Techniques: Association Rule Discovery
5
Classifiers 1: Bayesian Classification and Bayesian Networks
6
Classifiers 2: Decision Trees
Literature Review Assignment Due
7
Neural Networks 1: MLPs
8
Neural Networks 2: SOMs
9
Genetic Algorithm
10
Hidden Markov Models
Algorithm Implementation Assignment Due
Non teaching week
11
Information Visualization
12
Student presentations
Group Research Paper Due
13
Student presentations
Timetable
The timetable for on-campus classes for this unit can be viewed in Allocate+
Assessment
Assessment weighting
Assessment for the unit consists of 3 assignments with a weighting of 100% . Read this section VERY carefully.
Individual literature survey document and tutorial sheets
15%
Individual implementation of a data mining algorithm
20%
A group paper on an agreed topic of approximately 5000 words
50%
Group presentation of the paper to the class
15%
Assessment Policy
To pass this unit you must:
Get a total of 50% of the total marks of all of the assignments.
Your score for the unit will be calculated by:
The sum of the marks of the three sets of assignments.
Assessment Requirements
Assessment
Due Date
Weighting
Individual literature survey document and tutorial sheets
The respective tutes in Week 6
15%
Individual implementation of a data mining algorithm
The respective tutes in Week 10
20 %
A group paper on an agreed topic of approximately 5000 words
The respective tutes in Week 12
50 %
Group presentation of the paper to the class
Weeks 12 and 13
15 %
There is no exam in this unit
Exam period (S2/06) starts on 23/10/06
0 %
Assignment specifications will be made available on the CSE5230 Unit Web Site Assessment Page.
Assignment Submission
Assignments will be submitted by paper and CD submission with the appropriate cover sheet correctly filled out and attached to the students' respective tutor within the first 10 minutes of their respective tutes in the week of each assignment's due date. The literature review and group research paper must also be submitted to the Damocles online submission systems which link is available on the unit web site.
Do not email submissions. The due date is the date by which the submission must be received/the date by which the the submission is to be posted.
Extensions and late submissions
Late submission of assignments
Assignments received after the due date will be subject to a penalty of 10% per day the assignment is late. Assignments received later than one week after the due date will not normally be accepted.
This policy is strict because comments or guidance will be given on assignments as they are returned, and sample solutions may also be published and distributed, after assignment marking or with the returned assignment.
Extensions
It is your responsibility to structure your study program around assignment deadlines, family, work and other commitments. Factors such as normal work pressures, vacations, etc. are seldom regarded as appropriate reasons for granting extensions.
Requests for extensions must be made by email to the unit lecturer at least two days before the due date. You will be asked to forward original medical certificates in cases of illness, and may be asked to provide other forms of documentation where necessary. A copy of the email or other written communication of an extension must be attached to the assignment submission.
Grading of assessment
Assignments, and the unit, will be marked and allocated a grade according to the following scale:
Grade
Percentage/description
HD High Distinction -
very high levels of achievement, demonstrated knowledge and understanding, skills in application and high standards of work encompassing all aspects of the tasks.
In the 80+% range of marks for the assignment.
D Distinction -
high levels of achievement, but not of the same standards. May have a weakness in one particular aspect, or overall standards may not be quite as high.
In the 70-79% range.
C Credit -
sound pass displaying good knowledge or application skills, but some weaknesses in the quality, range or demonstration of understanding.
In the 60-69% range.
P Pass
acceptable standard, showing an adequate basic knowledge, understanding or skills, but with definite limitations on the extent of such understanding or application. Some parts may be incomplete.
In the 50-59% range.
N Not satisfactory
failure to meet the basic requirements of the assessment.
Below 50%.
Assignment return
We will aim to have assignment results made available to you within three weeks after assignment receipt.
Feedback
Feedback to you
You will receive feedback on your work and progress in this unit. This feedback may be provided through your participation in tutorials and class discussions, as well as through your assignment submissions. It may come in the form of individual advice, marks and comments, or it may be provided as comment or reflection targeted at the group. It may be provided through personal interactions, such as interviews and on-line forums, or through other mechanisms such as on-line self-tests and publication of grade distributions.
Feedback from you
You will be asked to provide feedback to the Faculty through a Unit Evaluation survey at the end of the semester. You may also be asked to complete surveys to help teaching staff improve the unit and unit delivery. Your input to such surveys is very important to the faculty and the teaching staff in maintaining relevant and high quality learning experiences for our students.
And if you are having problems
It is essential that you take action immediately if you realise that you have a problem with your study. The semester is short, so we can help you best if you let us know as soon as problems arise. Regardless of whether the problem is related directly to your progress in the unit, if it is likely to interfere with your progress you should discuss it with your lecturer or a Community Service counsellor as soon as possible.
Unit improvements
Correlation between the literature review and group research paper has been reduced by changing the focus of the topics.
Plagiarism and cheating
Plagiarism and cheating are regarded as very serious offences. In cases where cheating has been confirmed, students have been severely penalised, from losing all marks for an assignment, to facing disciplinary action at the Faculty level. While we would wish that all our students adhere to sound ethical conduct and honesty, I will ask you to acquaint yourself with Student Rights and Responsibilities and the Faculty regulations that apply to students detected cheating as these will be applied in all detected cases.
In this University, cheating means seeking to obtain an unfair advantage in any examination or any other written or practical work to be submitted or completed by a student for assessment. It includes the use, or attempted use, of any means to gain an unfair advantage for any assessable work in the unit, where the means is contrary to the instructions for such work.
When you submit an individual assessment item, such as a program, a report, an essay, assignment or other piece of work, under your name you are understood to be stating that this is your own work. If a submission is identical with, or similar to, someone else's work, an assumption of cheating may arise. If you are planning on working with another student, it is acceptable to undertake research together, and discuss problems, but it is not acceptable to jointly develop or share solutions unless this is specified by your lecturer.
Intentionally providing students with your solutions to assignments is classified as "assisting to cheat" and students who do this may be subject to disciplinary action. You should take reasonable care that your solution is not accidentally or deliberately obtained by other students. For example, do not leave copies of your work in progress on the hard drives of shared computers, and do not show your work to other students. If you believe this may have happened, please be sure to contact your lecturer as soon as possible.
Cheating also includes taking into an examination any material contrary to the regulations, including any bilingual dictionary, whether or not with the intention of using it to obtain an advantage.
Plagiarism involves the false representation of another person's ideas, or findings, as your own by either copying material or paraphrasing without citing sources. It is both professional and ethical to reference clearly the ideas and information that you have used from another writer. If the source is not identified, then you have plagiarised work of the other author. Plagiarism is a form of dishonesty that is insulting to the reader and grossly unfair to your student colleagues.
Communication
Communication methods
Email, phone and discussion group on the unit web site.
Notices
Notices related to the unit during the semester will be placed on the Home Page in the Unit Website. Check this regularly. Failure to read the announcements is not regarded as grounds for special consideration.
Consultation Times
Consultation timetables of the lecturer and tutors are available on the unit web site Timetable page.
If direct communication with your unit adviser/lecturer or tutor outside of consultation periods is needed you may contact the lecturer and/or tutors at:
All email communication to you from your lecturer will occur through your Monash student email address. Please ensure that you read it regularly, or forward your email to your main address. Also check that your contact information registered with the University is up to date in My.Monash.