Person: Grossniklaus, Michael
Cardinality Estimation using Label Probability Propagation for Subgraph Matching in Property Graph Databases
2022, Wörteler, Leonard, Renftle, Moritz, Chondrogiannis, Theodoros, Grossniklaus, Michael
Estimating query result cardinality is a central task of cost-based database query optimizers, enabling them to identify and avoid excessively large intermediate results. While cardinality estimation has been studied extensively in relational databases, research in the setting of graph databases has been more limited. In this paper, we address the problem of cardinality estimation for subgraph matching on property graph databases. Our novel cardinality estimation technique starts from a small amount of statistical information about node labels and relationship types, which is propagated along the graph query pattern in terms of label probabilities. Additionally, estimation quality can be improved by providing information about labels or properties to our technique, if available. In our experimental evaluation, we compare our approach to state-of-the-art cardinality estimation techniques for subgraph matching for property graph, RDF, and relational databases, and we demonstrate that our technique offers the best trade-off between accuracy and efficiency.
SAHARA : Memory Footprint Reduction of Cloud Databases with Automated Table Partitioning
2022, Brendle, Michael, Weber, Nick, Valiyev, Mahammad, May, Norman, Schulze, Robert, Böhm, Alexander, Moerkotte, Guido, Grossniklaus, Michael
Enterprises increasingly move their databases into the cloud. As a result, database-as-a-service providers are challenged to meet the performance guarantees assured in service-level agreements (SLAs) while keeping hardware costs as low as possible. Being cost-effective is particularly crucial for cloud databases where the provisioned amount of DRAM dominates the hardware costs. A way to decrease the memory footprint is to leverage access skew in the workload by moving rarely accessed cold data to cheaper storage layers and retaining only frequently accessed hot data in main memory. In this paper, we present SAHARA, an advisor that proposes a table partitioning for column stores with minimal memory footprint while still adhering to all performance SLAs. SAHARA collects lightweight workload statistics, classifies data as hot and cold, and calculates optimal or near-optimal range partitioning layouts with low optimization time using a novel cost model. We integrated SAHARA into a commercial cloud database and show in our experiments for real-world and synthetic benchmarks a memory footprint reduction of 2.5× while still fulfilling all performance SLAs provided by the customer or advertised by the DBaaS provider.
Where to go : Computational and visual what-if analyses in soccer
2019-12-17, Stein, Manuel, Seebacher, Daniel, Marcelino, Rui, Schreck, Tobias, Grossniklaus, Michael, Keim, Daniel A., Janetzko, Halldor
To prepare their teams for upcoming matches, analysts in professional soccer watch and manually annotate up to three matches a day. When annotating matches, domain experts try to identify and improve suboptimal movements based on intuition and professional experience. The high amount of matches needing to be analysed manually result in a tedious and time-consuming process, and results may be subjective. We propose an automatic approach for the realisation of effective region-based what-if analyses in soccer. Our system covers the automatic detection of region-based faulty movement behaviour, as well as the automatic suggestion of possible improved alternative movements. As we show, our approach effectively supports analysts and coaches investigating matches by speeding up previously time-consuming work. We enable domain experts to include their domain knowledge in the analysis process by allowing to interactively adjust suggested improved movement, as well as its implications on region control. We demonstrate the usefulness of our proposed approach via an expert study with three invited domain experts, one being head coach from the first Austrian soccer league. As our results show that experts most often agree with the suggested player movement (83%), our proposed approach enhances the analytical capabilities in soccer and supports a more efficient analysis.
From Movement to Events : Improving Soccer Match Annotations
2019, Stein, Manuel, Seebacher, Daniel, Karge, Tassilo, Polk, Tom, Grossniklaus, Michael, Keim, Daniel A.
Match analysis has become an important task in everyday work at professional soccer clubs in order to improve team performance. Video analysts regularly spend up to several days analyzing and summarizing matches based on tracked and annotated match data. Although there already exists extensive capabilities to track the movement of players and the ball from multimedia data sources such as video recordings, there is no capability to sufficiently detect dynamic and complex events within these data. As a consequence, analysts have to rely on manually created annotations, which are very time-consuming and expensive to create. We propose a novel method for the semi-automatic definition and detection of events based entirely on movement data of players and the ball. Incorporating Allen’s interval algebra into a visual analytics system, we enable analysts to visually define as well as search for complex, hierarchical events. We demonstrate the usefulness of our approach by quantitatively comparing our automatically detected events with manually annotated events from a professional data provider as well as several expert interviews. The results of our evaluation show that the required annotation time for complete matches by using our system can be reduced to a few seconds while achieving a similar level of performance.
Workload aware data partitioning
2022, May, Norman, Boehm, Alexander, Moerkotte, Guido, Brendle, Michael, Valiyev, Mahammad, Weber, Nick, Schulze, Robert, Grossniklaus, Michael
Techniques and solutions are described for partitioning data among different types of computer-readable storage media, such as between RAM and disk-based storage. A measured workload can be used to estimate data access for one or more possible partition arrangements. The partitions arrangements can be automatically enumerated. Scores for the partition arrangements can be calculated, where a score can indicate how efficiently a partition arrangement places frequently accessed data into storage specified for frequently-accessed data and placed infrequently accessed data into storage specified for infrequently accessed data.
Implementation of Data Access Metrics for Automated Physical Database Design
2022, Brendle, Michael, May, Norman, Schulze, Robert, Boehm, Alexander, Moerkotte, Guido, Grossniklaus, Michael
Towards Reproducible Research of Event Detection Techniques for Twitter
2019-06, Weiler, Andreas, Schilling, Harry, Kircher, Lukas, Grossniklaus, Michael
A major challenge in many research areas is reproducibility of implementations, experiments, or evaluations. New data sources and research directions complicate the reproducibility even more. For example, Twitter continues to gain popularity as a source of up-to-date news and information. As a result, numerous event detection techniques have been proposed to cope with the steadily increasing rate and volume of social media data streams. Although some of these works provide their implementation or conduct an evaluation of the proposed technique, it is almost impossible to reproduce their experiments. The main drawback is that Twitter prohibits the release of crawled datasets that are used by researchers in their experiments. In this work, we present a survey of the vast landscape of implementations, experiments, and evaluations presented by the different research works. Furthermore, we propose a reproducibility toolkit including Twistor (Twitter Stream Simulator), which can be used to simulate an artificial Twitter data stream (including events) as input for the experiments or evaluations of event detection techniques. We further present the experimental application of the reproducibility toolkit to state-of-the-art event detection techniques.
History oblivious route recovery on road networks
2022, Chondrogiannis, Theodoros, Bornholdt, Johann, Bouros, Panagiotis, Grossniklaus, Michael
The availability of GPS sensors in vehicles has enabled the collection of trajectory data that can be utilized to improve the quality of location-based services. However, mostly due to privacy concerns, many data sets are published without containing entire trajectories but only the source location, the target location and the duration of recorded trips. In this paper, we study the problem of route recovery from trip data. In contrast to recent works that assume the availability of entire trajectories for past trips, we investigate methods for route recovery in the absence of such historical data, and we present methods for recovering the single most likely route that a vehicle has travelled. Furthermore, we introduce the region recovery problem that aims at determining a small region that is very likely to contain the traveled route. We also introduce region recovery methods for both single trips and trip groups. In a comprehensive experimental evaluation, we study the efficacy of our solutions for both the route and the region recovery problem. For the region recovery problem in particular, we demonstrate the pros and cons of each method along with the trade-off they offer between the size of the recovered region and the likelihood that the region contains the actual route.
Online Landmark-Based Batch Processing of Shortest Path Queries
2021, Hotz, Manuel, Chondrogiannis, Theodoros, Wörteler, Leonard, Grossniklaus, Michael
Processing shortest path queries is a basic operation in many graph problems. Both preprocessing-based and batch processing techniques have been proposed to speed up the computation of a single shortest path by amortizing its costs. However, both of these approaches suffer from limitations. The former techniques are prohibitively expensive in situations where the precomputed information needs to be updated frequently due to changes in the graph, while the latter require coordinates and cannot be used on non-spatial graphs. In this paper, we address both limitations and propose novel techniques for batch processing shortest paths queries using landmarks. We show how preprocessing can be avoided entirely by integrating the computation of landmark distances into query processing. Our experimental results demonstrate that our techniques outperform the state of the art on both spatial and non-spatial graphs with a maximum speedup of 3.61 × in online scenarios.
Experiences with Implementing Landmark Embedding in Neo4j
2019, Hotz, Manuel, Chondrogiannis, Theodoros, Wörteler, Leonard, Grossniklaus, Michael
Reachability, distance, and shortest path queries are fundamental operations in the field of graph data management with various applications in research and industry. However, while various preprocessing-based methods have been proposed to optimize the computation of such queries, the integration of existing methods into graph database management systems and processing frameworks has been limited. In this paper, we present an implementation of a static graph index that employs landmark embedding for Neo4j, to enable the index-based computation of reachability, distance, and shortest path queries on the database. We explore different strategies for selecting landmarks and different schemes for storing the precomputed landmark distances. To evaluate the efficiency of each landmark selection strategy and each storage scheme, we conduct an experimental evaluation using four real-world network datasets. We measure the preprocessing cost, the query processing time, and the accuracy of the distance estimation of different configurations of our index structure.