Guided Association Mining through Dynamic Constraint Refinement

Author: Aaron John Ceglar

Ceglar, Aaron John, 2005 Guided Association Mining through Dynamic Constraint Refinement, Flinders University, School of Computer Science, Engineering and Mathematics

Terms of Use: This electronic version is (or will be) made publicly available by Flinders University in accordance with its open access policy for student theses. Copyright in this thesis remains with the author. You may use this material for uses permitted under the Copyright Act 1968. If you are the owner of any included third party copyright material and/or you believe that any material has been made available without permission of the copyright owner please contact copyright@flinders.edu.au with the details.

Abstract

Association mining, the discovery of interesting inferences from within a dataset, is ultimately subjective as only the user can assess the practical usefulness of an inference. To this effect, an association mining system harnesses the user's perceptual capabilities and the computer's processing power to improve the quality of a set of inferences. Although current association mining systems tightly involve the user within the pre-processing and presentation stages, the analysis stage of the association mining process remains relatively autonomous and opaque. This lack of user involvement constrains domain space exploration and subsequent inference derivation, potentially reducing inference quality, due to the lack of user-computer synergy. The theory of guided association mining and its realisation represents a timely and logical step in the progression of association mining research. Early research focused upon algorithmic efficiency, addressing issues such as I/O reduction and scalability, however this seems to have reached a point of diminishing return. The research focus has therefore shifted to improving result quality, or improving inference interest, rather than the speed at which the results are generated, including areas of research such as measures of interestingness and semantic inclusion. However, these areas of research which attempt to incorporate domain knowledge within analysis, fall short of providing user-computer synergy as the specified constraints are statically included within an automated process. Given this static constraint inclusion, the derivation of quality inferences often requires an iterative analysis process, whereby a set of quality inferences is converged upon through iterative constraint refinement. This thesis argues that by maintaining the user-computer synergy during analysis, the quality of discovered inferences can be improved. This is achieved by opening the opaque 'black box' analysis process and providing functionality through which the user can interact, and subsequently guide, domain space exploration. Thus by enabling the user to dynamically focus exploration upon concept areas of specific interest, the quality of the derived inferences will improve. This thesis addresses the next step in providing analysis synergy by enabling the user to dynamically refine constraints during analysis instead of between analysis iterations. To this end a guided mining architecture is proposed that merges the currently accepted knowledge discovery architecture with the model-view-controller architecture, enabling analysis synergy through the provision of a transparent and interactive analysis environment. Furthermore this thesis also makes novel contributions to the foundation fields of analysis and rule presentation, by way of an incremental closed-set association mining algorithm and an association visualisation technique that accommodates hierarchical semantics.

Keywords: Association mining,data mining,analysis synergy,user-computer synergy

Subject: Computer Science thesis

Thesis type: Doctor of Philosophy
Completed: 2005
School: School of Computer Science, Engineering and Mathematics
Supervisor: John Roddick