Discovering Dynamic Classification Hierarchies in OLAP Dimensions

The standard approach to OLAP requires measures and dimensions of a cube to be known at the design stage. Besides, dimensions are required to be non-volatile, balanced and normalized. These constraints appear too rigid for many data sets, especially semi-structured ones, such as user-generated content in social networks and other web applications. We enrich the multidimensional analysis of such data via content-driven discovery of dimensions and classification hierarchies. Discovered elements are dynamic by nature and evolve along with the underlying data set.

We demonstrate the benefits of our approach by building a data warehouse for the public stream of the popular social network and microblogging service Twitter. Our approach allows to classify users by their activity, popularity, behavior as well as to organize messages by topic, impact, origin, method of generation, etc. Such capturing of the dynamic characteristic of the data adds more intelligence to the analysis and extends the limits of OLAP.

Weiler, Andreas; Rehman, Nafees Ur; Mansmann, Svetlana; Scholl, Marc H. (2012)

