Highway Systems : How Good are They, Really?
2023, Chondrogiannis, Theodoros, Grossniklaus, Michael
Highways play a crucial role in transportation services as they facilitate long-distance traveling and allow driving at an almost constant speed, thus resulting in lower fuel consumption and emissions. Many existing highway systems were designed before practical computational tools had been developed. Furthermore, most existing approaches to evaluating highways focus on analyzing mobility data rather than studying the design of the highway system. To address this gap in existing research, in this paper, we study the problem of evaluating the efficacy of the design of real-world highway systems. To this end, we propose two novel measures for the efficacy of highway systems, along with algorithms to compute them. In addition, we present a first-cut heuristic algorithm that aims at computing a highway system that optimizes our proposed measures. In our experiments, we demonstrate the potential of our methods in measuring the efficacy of real-world highway systems. We also evaluate the performance of our heuristic algorithm in computing a rough design of an efficient highway system.
SAHARA : Memory Footprint Reduction of Cloud Databases with Automated Table Partitioning
2022, Brendle, Michael, Weber, Nick, Valiyev, Mahammad, May, Norman, Schulze, Robert, Böhm, Alexander, Moerkotte, Guido, Grossniklaus, Michael
Enterprises increasingly move their databases into the cloud. As a result, database-as-a-service providers are challenged to meet the performance guarantees assured in service-level agreements (SLAs) while keeping hardware costs as low as possible. Being cost-effective is particularly crucial for cloud databases where the provisioned amount of DRAM dominates the hardware costs. A way to decrease the memory footprint is to leverage access skew in the workload by moving rarely accessed cold data to cheaper storage layers and retaining only frequently accessed hot data in main memory. In this paper, we present SAHARA, an advisor that proposes a table partitioning for column stores with minimal memory footprint while still adhering to all performance SLAs. SAHARA collects lightweight workload statistics, classifies data as hot and cold, and calculates optimal or near-optimal range partitioning layouts with low optimization time using a novel cost model. We integrated SAHARA into a commercial cloud database and show in our experiments for real-world and synthetic benchmarks a memory footprint reduction of 2.5× while still fulfilling all performance SLAs provided by the customer or advertised by the DBaaS provider.
From Movement to Events : Improving Soccer Match Annotations
2019, Stein, Manuel, Seebacher, Daniel, Karge, Tassilo, Polk, Tom, Grossniklaus, Michael, Keim, Daniel A.
Match analysis has become an important task in everyday work at professional soccer clubs in order to improve team performance. Video analysts regularly spend up to several days analyzing and summarizing matches based on tracked and annotated match data. Although there already exists extensive capabilities to track the movement of players and the ball from multimedia data sources such as video recordings, there is no capability to sufficiently detect dynamic and complex events within these data. As a consequence, analysts have to rely on manually created annotations, which are very time-consuming and expensive to create. We propose a novel method for the semi-automatic definition and detection of events based entirely on movement data of players and the ball. Incorporating Allen’s interval algebra into a visual analytics system, we enable analysts to visually define as well as search for complex, hierarchical events. We demonstrate the usefulness of our approach by quantitatively comparing our automatically detected events with manually annotated events from a professional data provider as well as several expert interviews. The results of our evaluation show that the required annotation time for complete matches by using our system can be reduced to a few seconds while achieving a similar level of performance.
On the Performance of Analytical and Pattern Matching Graph Queries in Neo4j and a Relational Database
2017, Hölsch, Jürgen, Schmidt, Tobias, Grossniklaus, Michael
Graph databases with a custom non-relational backend promote themselves to outperform relational databases in answering queries on large graphs. Recent empirical studies show that this claim is not always true. However, these studies focus only on pattern matching queries and neglect analytical queries used in practice such as shortest path, diameter, degree centrality or closeness centrality. In addition, there is no distinction between different types of pattern matching queries. In this paper, we introduce a set of analytical and pattern matching queries, and evaluate them in Neo4j and a market-leading commercial relational database system. We show that the relational database system outperforms Neo4j for our analytical queries and that Neo4j is faster for queries that do not filter on specific edge types.
History oblivious route recovery on road networks
2022, Chondrogiannis, Theodoros, Bornholdt, Johann, Bouros, Panagiotis, Grossniklaus, Michael
The availability of GPS sensors in vehicles has enabled the collection of trajectory data that can be utilized to improve the quality of location-based services. However, mostly due to privacy concerns, many data sets are published without containing entire trajectories but only the source location, the target location and the duration of recorded trips. In this paper, we study the problem of route recovery from trip data. In contrast to recent works that assume the availability of entire trajectories for past trips, we investigate methods for route recovery in the absence of such historical data, and we present methods for recovering the single most likely route that a vehicle has travelled. Furthermore, we introduce the region recovery problem that aims at determining a small region that is very likely to contain the traveled route. We also introduce region recovery methods for both single trips and trip groups. In a comprehensive experimental evaluation, we study the efficacy of our solutions for both the route and the region recovery problem. For the region recovery problem in particular, we demonstrate the pros and cons of each method along with the trade-off they offer between the size of the recovered region and the likelihood that the region contains the actual route.
Online Landmark-Based Batch Processing of Shortest Path Queries
2021, Hotz, Manuel, Chondrogiannis, Theodoros, Wörteler, Leonard, Grossniklaus, Michael
Processing shortest path queries is a basic operation in many graph problems. Both preprocessing-based and batch processing techniques have been proposed to speed up the computation of a single shortest path by amortizing its costs. However, both of these approaches suffer from limitations. The former techniques are prohibitively expensive in situations where the precomputed information needs to be updated frequently due to changes in the graph, while the latter require coordinates and cannot be used on non-spatial graphs. In this paper, we address both limitations and propose novel techniques for batch processing shortest paths queries using landmarks. We show how preprocessing can be avoided entirely by integrating the computation of landmark distances into query processing. Our experimental results demonstrate that our techniques outperform the state of the art on both spatial and non-spatial graphs with a maximum speedup of 3.61 × in online scenarios.
Revealing the Invisible : Visual Analytics and Explanatory Storytelling for Advanced Team Sport Analysis
2018-10, Stein, Manuel, Breitkreutz, Thorsten, Häußler, Johannes, Seebacher, Daniel, Niederberger, Christoph, Schreck, Tobias, Grossniklaus, Michael, Keim, Daniel A., Janetzko, Halldor
The analysis of invasive team sports often concentrates on cooperative and competitive aspects of collective movement behavior. A main goal is the identification and explanation of strategies, and eventually the development of new strategies. In visual sports analytics, a range of different visual-interactive analysis techniques have been proposed, e.g., based on visualization using for example trajectories, graphs, heatmaps, and animations. Identifying suitable visualizations for a specific situation is key to a successful analysis. Existing systems enable the interactive selection of different visualization facets to support the analysis process. However, an interactive selection of appropriate visualizations is a difficult, complex, and time-consuming task. In this paper, we propose a four-step analytics conceptual workflow for an automatic selection of appropriate views for key situations in soccer games. Our concept covers classification, specification, explanation, and alteration of match situations, effectively enabling the analysts to focus on important game situations and the determination of alternative moves. Combining abstract visualizations with real world video recordings by Immersive Visual Analytics and descriptive storylines, we support domain experts in understanding key situations. We demonstrate the usefulness of our proposed conceptual workflow via two proofs of concept and evaluate our system by comparing our results to manual video annotations by domain experts. Initial expert feedback shows that our proposed concept improves the understanding of competitive sports and leads to a more efficient data analysis.
Cardinality Estimation using Label Probability Propagation for Subgraph Matching in Property Graph Databases
2022, Wörteler, Leonard, Renftle, Moritz, Chondrogiannis, Theodoros, Grossniklaus, Michael
Estimating query result cardinality is a central task of cost-based database query optimizers, enabling them to identify and avoid excessively large intermediate results. While cardinality estimation has been studied extensively in relational databases, research in the setting of graph databases has been more limited. In this paper, we address the problem of cardinality estimation for subgraph matching on property graph databases. Our novel cardinality estimation technique starts from a small amount of statistical information about node labels and relationship types, which is propagated along the graph query pattern in terms of label probabilities. Additionally, estimation quality can be improved by providing information about labels or properties to our technique, if available. In our experimental evaluation, we compare our approach to state-of-the-art cardinality estimation techniques for subgraph matching for property graph, RDF, and relational databases, and we demonstrate that our technique offers the best trade-off between accuracy and efficiency.
Experiences with Implementing Landmark Embedding in Neo4j
2019, Hotz, Manuel, Chondrogiannis, Theodoros, Wörteler, Leonard, Grossniklaus, Michael
Reachability, distance, and shortest path queries are fundamental operations in the field of graph data management with various applications in research and industry. However, while various preprocessing-based methods have been proposed to optimize the computation of such queries, the integration of existing methods into graph database management systems and processing frameworks has been limited. In this paper, we present an implementation of a static graph index that employs landmark embedding for Neo4j, to enable the index-based computation of reachability, distance, and shortest path queries on the database. We explore different strategies for selecting landmarks and different schemes for storing the precomputed landmark distances. To evaluate the efficiency of each landmark selection strategy and each storage scheme, we conduct an experimental evaluation using four real-world network datasets. We measure the preprocessing cost, the query processing time, and the accuracy of the distance estimation of different configurations of our index structure.
Bucket Selection : A Model-Independent Diverse Selection Strategy for Widening
2017-10-04, Fillbrunn, Alexander, Wörteler, Leonard, Grossniklaus, Michael, Berthold, Michael R.
When using a greedy algorithm for finding a model, as is the case in many data mining algorithms, there is a risk of getting caught in local extrema, i.e., suboptimal solutions. Widening is a technique for enhancing greedy algorithms by using parallel resources to broaden the search in the model space. The most important component of widening is the selector, a function that chooses the next models to refine. This selector ideally enforces diversity within the selected set of models in order to ensure that parallel workers explore sufficiently different parts of the model space and do not end up mimicking a simple beam search. Previous publications have shown that this works well for problems with a suitable distance measure for the models, but if no such measure is available, applying widening is challenging. In addition these approaches require extensive, sequential computations for diverse subset selection, making the entire process much slower than the original greedy algorithm. In this paper we propose the bucket selector, a model-independent randomized selection strategy. We find that (a) the bucket selector is a lot faster and not significantly worse when a diversity measure exists and (b) it performs better than existing selection strategies in cases without a diversity measure.