Discovering Patterns and Anomalies in Association Rules

Author: Ping Liang

Liang, Ping, 2015 Discovering Patterns and Anomalies in Association Rules, Flinders University, School of Computer Science, Engineering and Mathematics

Terms of Use: This electronic version is (or will be) made publicly available by Flinders University in accordance with its open access policy for student theses. Copyright in this thesis remains with the author. You may use this material for uses permitted under the Copyright Act 1968. If you are the owner of any included third party copyright material and/or you believe that any material has been made available without permission of the copyright owner please contact copyright@flinders.edu.au with the details.

Abstract

Higher order mining (HOM), which mines over patterns/models derived from one or more large and/or complex datasets, has been widely used in a variety of ways and provides benefits such as the ability to combine mining strategies through the modular combination of components and the development of higher order explanations in describing facts about data. Based on the idea of HOM, this thesis addresses two important but unanswered issues.

First, while the discovery of rules that can inform business decision making is the ultimate goal of data mining technology, the search for rules that adhere to a user’s definition of interesting remains somewhat elusive, in part because rules are commonly supplied in a low, instance-level format. In order to tackle this problem, this thesis proposes the concept of ruleset patterns to represent complex patterns in sets of rules reflecting a user’s definition of interesting and presents a proof-of-concept system, Horace, for efficient ruleset pattern discovery. Since frequent pattern or prefix trees are (generally speaking) isomorphic with the resulting ruleset, Horace employs a novel tree-based approach to searching such intermediate data structures for patterns. Experimental results show the approach is both usable and efficient to search for rules that are sought by users.

Second, the detection of unusual or anomalous data is an important function in automated data analysis or data mining. However, the diversity of anomaly detection algorithms shows that it is often difficult to determine which algorithms might best detect anomalies given any random dataset. This thesis provides a partial solution to this problem by elevating the search for anomalous data in transaction-oriented datasets to an inspection of the rules that can be produced by higher order longitudinal/spatio-temporal association rule mining. The motivation behind the approach is in two aspects. Firstly, the primary or raw data might not be always available; thus in some cases, researchers can operate only on the rules generated from the source data. Furthermore, since HOM facilitates the characterisation of items participating in rulesets in terms of real-world descriptions (such as competitor, catalyst and so on), such a technique may provide a view of anomalies that is arguably closer to that sought by information analysts. In this thesis, two anomaly detection algorithms are proposed to find anomalies/outliers and a proof-of-concept prototype has been developed and tested. The experimental results demonstrate the soundness and feasibility of the proposed approach.

Keywords: higher order mining, association rule, ruleset pattern, anomaly detection

Subject: Computer Science thesis

Thesis type: Doctor of Philosophy
Completed: 2015
School: School of Computer Science, Engineering and Mathematics
Supervisor: John Roddick