Mining sequential patterns
Julkaisuvuosi
2001
Tekijät
Ahola, Jussi
Tiivistelmä
Discovering associations is one of the fundamental tasks of data mining. Its aim is to automatically seek for dependencies from vast amounts of data. The task results in socalled association rules, which are of form: If A occurs in the data then B occurs also. Only those rules that occur in the data frequently enough are generated. However, various information sources generate data with an inherent sequential nature, i.e., it is composed of discrete events which have a temporal/spatial ordering. This kind of data can be obtained from, e.g., telecommunications networks, electronic commerce, www-servers of Internet, and various scientific sources, like gene databases. The sequential nature of the data is totally ignored in the generation of the association rules. Thus, a part of the useful information included in the data is discarded. Thus, since the mid 90's the interest in discovering also the sequential associations in the data has arisen among the data mining community. The sequential associations or sequential patterns can be presented in the form: when A occurs, B occurs within some certain time. So, the difference to traditional association rules is that here the time information is included both in the rule itself and also in the mining process in the form of timing constraints. Nowadays there exist several highly efficient methods for mining these kind of patterns. The problem with them is that they assume the input data to be sequences of discrete events including only the information of the ordering, usually the time. Often, however, the events are associated with some additional attributes. The existing methods cannot take this multi-dimensionality of the data into account and so they lose the additional information it involves. Furthermore, the methods are designed for some specific problem, and are not, as such, applicable to different types of sequential data. In this report, a general formulation of the sequential patterns is introduced as it is presented in [1]. By using this approach the last problem of the existing algorithms can be tackled. A survey of the existing algorithm is then done. Three algorithms are presented in detail: WINEPI [2] and GSP [4] as they form the basis of the algorithms, and cSPADE [6] since it seems to be the most promising method proposed for the problem yet. Also the other relevant approaches are shortly introduced. Lastly, the extension of the patterns into the multi-dimensional is considered. Some ideas of handling the problem are given and also the features of the existing algorithms supporting multi-dimensionality are studied.
Näytä enemmänOrganisaatiot ja tekijät
Teknologian tutkimuskeskus VTT Oy
Ahola Jussi
Julkaisutyyppi
Julkaisumuoto
Erillisteos
Yleisö
Ammatillinen
OKM:n julkaisutyyppiluokitus
D4 Julkaistu kehittämis- tai tutkimusraportti taikka -selvitys
Julkaisukanavan tiedot
Lehti/Sarja
VTT Research Report
Kustantaja
VTT Technical Research Centre of Finland
Numero
TTE1-2001-10
Avoin saatavuus
Avoin saatavuus kustantajan palvelussa
Kyllä
Kustantajan version lisenssi
Muu lisenssi
Rinnakkaistallennettu
Ei
Muut tiedot
Tieteenalat
Tietojenkäsittely ja informaatiotieteet
Avainsanat
[object Object]
Kieli
englanti
Kansainvälinen yhteisjulkaisu
Ei
Yhteisjulkaisu yrityksen kanssa
Ei
Julkaisu kuuluu opetus- ja kulttuuriministeriön tiedonkeruuseen
Ei