Top-k Sequence Pattern Mining with Non-overlapping Condition

Xin CHAI, Dan YANG, Jingyu LIU, Yan LI, Youxi Wu

Abstract


Pattern mining has been widely applied in many fields. Users often mine a large number of patterns. However, most of these are difficult to apply in real applications. Top-k pattern mining, which involves finding the most frequent k patterns, is an effective strategy, because the more frequently a pattern occurs, the more likely they are to be important for users. However, top-k mining can only mine short patterns in mining applications with the Apriori property. It is well-known that short patterns contain less information than long patterns. In this paper, we focus on mining top-k sequence patterns of each pattern length. We propose an effective algorithm, named NOSTOPK (non-overlapping sequence pattern mining for top-k). The algorithm calculates the support of a pattern using a Nettree data structure, which has been introduced to tackle various types of pattern matching and sequence pattern mining issues. We find the top k patterns of length len, and calculate the supports of the corresponding k * |∑| super-patterns of length len+1 to discover the new top k super-patterns with len+1. Experimental results demonstrate that the algorithm achieves a better performance than comparable algorithms.

Full Text:

PDF

Refbacks

  • There are currently no refbacks.