undefined

Mining sequential patterns

Julkaisuvuosi

2001

Tekijät

Ahola, Jussi

Tiivistelmä

Discovering associations is one of the fundamental tasks of data mining. Its aim is to automatically seek for dependencies from vast amounts of data. The task results in socalled association rules, which are of form: If A occurs in the data then B occurs also. Only those rules that occur in the data frequently enough are generated. However, various information sources generate data with an inherent sequential nature, i.e., it is composed of discrete events which have a temporal/spatial ordering. This kind of data can be obtained from, e.g., telecommunications networks, electronic commerce, www-servers of Internet, and various scientific sources, like gene databases. The sequential nature of the data is totally ignored in the generation of the association rules. Thus, a part of the useful information included in the data is discarded. Thus, since the mid 90's the interest in discovering also the sequential associations in the data has arisen among the data mining community. The sequential associations or sequential patterns can be presented in the form: when A occurs, B occurs within some certain time. So, the difference to traditional association rules is that here the time information is included both in the rule itself and also in the mining process in the form of timing constraints. Nowadays there exist several highly efficient methods for mining these kind of patterns. The problem with them is that they assume the input data to be sequences of discrete events including only the information of the ordering, usually the time. Often, however, the events are associated with some additional attributes. The existing methods cannot take this multi-dimensionality of the data into account and so they lose the additional information it involves. Furthermore, the methods are designed for some specific problem, and are not, as such, applicable to different types of sequential data. In this report, a general formulation of the sequential patterns is introduced as it is presented in [1]. By using this approach the last problem of the existing algorithms can be tackled. A survey of the existing algorithm is then done. Three algorithms are presented in detail: WINEPI [2] and GSP [4] as they form the basis of the algorithms, and cSPADE [6] since it seems to be the most promising method proposed for the problem yet. Also the other relevant approaches are shortly introduced. Lastly, the extension of the patterns into the multi-dimensional is considered. Some ideas of handling the problem are given and also the features of the existing algorithms supporting multi-dimensionality are studied.
Näytä enemmän

Organisaatiot ja tekijät

Julkaisutyyppi

Julkaisumuoto

Erillisteos

Yleisö

Ammatillinen

OKM:n julkaisutyyppiluokitus

D4 Julkaistu kehittämis- tai tutkimusraportti taikka -selvitys

Julkaisukanavan tiedot

Lehti/Sarja

VTT Research Report

Kustantaja

VTT Technical Research Centre of Finland

Numero

TTE1-2001-10

Avoin saatavuus

Avoin saatavuus kustantajan palvelussa

Kyllä

Kustantajan version lisenssi

Muu lisenssi

Rinnakkaistallennettu

Ei

Muut tiedot

Tieteenalat

Tietojenkäsittely ja informaatiotieteet

Avainsanat

[object Object]

Kieli

englanti

Kansainvälinen yhteisjulkaisu

Ei

Yhteisjulkaisu yrityksen kanssa

Ei

Julkaisu kuuluu opetus- ja kulttuuriministeriön tiedonkeruuseen

Ei