OLAP on Multidimensional Text Databases: Topic Network Cube and its Applications

zhiyuan zhang, Hong Wang, Xingjie Feng

Abstract


Multidimensional text data contains both structured attributes and unstructured text. Unlike the traditional numerical data, it is not straightforward to apply online analytical processing on multidimensional text data. Although some OLAP methods such as topic cube have been proposed in order to effectively utilize its structured information and valuable text data, these methods can’t tell the relations of topic words. Considering that topics usually consist of several subtopics and each subtopic usually contains some topic words, we here use a topic network manner, in which related topic words are connected, to express the complex relations of topics. This paper introduces a new concept of topic network cube to perform OLAP analysis on multidimensional text databases. Firstly, we propose a method called GL-LDA based on Gibbs sampling outputs of Labeled LDA to measure the relations between topic words. Secondly, we give a storage model of topic network cube which can efficiently generate topic network using GL-LDA. Thirdly, we show how to perform OLAP analysis on topic network cube. Experimental results show that we can analyze multidimensional text databases in different granularities easily and effectively using just a few simple SQL statements, and the output network provides rich and useful information of topics.

Full Text:

PDF

Refbacks

  • There are currently no refbacks.