The Discovery and Retrieval of Temporal Rules in Interval Sequence Data

Author: Edi Winarko

Winarko, Edi, 2007 The Discovery and Retrieval of Temporal Rules in Interval Sequence Data, Flinders University, School of Informatics and Engineering

This electronic version is made publicly available by Flinders University in accordance with its open access policy for student theses. Copyright in this thesis remains with the author. This thesis may incorporate third party material which has been used by the author pursuant to Fair Dealing exceptions. If you are the owner of any included third party copyright material and/or you believe that any material has been made available without permission of the copyright owner please contact copyright@flinders.edu.au with the details.

Abstract

Data mining is increasingly becoming important tool in extracting interesting knowledge from large databases. Many industries are now using data mining tools for analysing their large collections of databases and making business decisions. Many data mining problems involve temporal aspects, with examples ranging from engineering to scientific research, finance and medicine. Temporal data mining is an extension of data mining which deals with temporal data. Mining temporal data poses more challenges than mining static data. While the analysis of static data sets often comes down to the question of data items, with temporal data there are many additional possible relations. One of the tasks in temporal data mining is the pattern discovery task, whose objective is to discover time-dependent correlations, patterns or rules between events in large volumes of data. To date, most temporal pattern discovery research has focused on events existing at a point in time rather than over a temporal interval. In comparison to static rules, mining with respect to time points provides semantically richer rules. However, accommodating temporal intervals offers rules that are richer still. This thesis addresses several issues related to the pattern discovery from interval sequence data. Despite its importance, this area of research has received relatively little attention and there are still many issues that need to be addressed. Three main issues that this thesis considers include the definition of what constitutes an interesting pattern in interval sequence data, the efficient mining for patterns in the data, and the identification of interesting patterns from a large number of discovered patterns. In order to deal with these issues, this thesis formulates the problem of discovering rules, which we term richer temporal association rules, from interval sequence databases. Furthermore, this thesis develops an efficient algorithm, ARMADA, for discovering richer temporal association rules. The algorithm does not require candidate generation. It utilizes a simple index, and only requires at most two database scans. In this thesis, a retrieval system is proposed to facilitate the selection of interesting rules from a set of discovered richer temporal association rules. To this end, a high-level query language specification, TAR-QL, is proposed to specify the criteria of the rules to be retrieved from the rule sets. Three low-level methods are developed to evaluate queries involving rule format conditions. In order to improve the performance of the methods, signature file based indexes are proposed. In addition, this thesis proposes the discovery of inter-transaction relative temporal association rules from event sequence databases.

Keywords: data mining,temporal rule,interval data,sequence data
Subject: Informatics thesis, Engineering thesis

Thesis type: Doctor of Philosophy
Completed: 2007
School: School of Computer Science, Engineering and Mathematics
Supervisor: Professor John Roddick