History oblivious route recovery on road networks
2022, Chondrogiannis, Theodoros, Bornholdt, Johann, Bouros, Panagiotis, Grossniklaus, Michael
The availability of GPS sensors in vehicles has enabled the collection of trajectory data that can be utilized to improve the quality of location-based services. However, mostly due to privacy concerns, many data sets are published without containing entire trajectories but only the source location, the target location and the duration of recorded trips. In this paper, we study the problem of route recovery from trip data. In contrast to recent works that assume the availability of entire trajectories for past trips, we investigate methods for route recovery in the absence of such historical data, and we present methods for recovering the single most likely route that a vehicle has travelled. Furthermore, we introduce the region recovery problem that aims at determining a small region that is very likely to contain the traveled route. We also introduce region recovery methods for both single trips and trip groups. In a comprehensive experimental evaluation, we study the efficacy of our solutions for both the route and the region recovery problem. For the region recovery problem in particular, we demonstrate the pros and cons of each method along with the trade-off they offer between the size of the recovered region and the likelihood that the region contains the actual route.
SAHARA : Memory Footprint Reduction of Cloud Databases with Automated Table Partitioning
2022, Brendle, Michael, Weber, Nick, Valiyev, Mahammad, May, Norman, Schulze, Robert, Böhm, Alexander, Moerkotte, Guido, Grossniklaus, Michael
Enterprises increasingly move their databases into the cloud. As a result, database-as-a-service providers are challenged to meet the performance guarantees assured in service-level agreements (SLAs) while keeping hardware costs as low as possible. Being cost-effective is particularly crucial for cloud databases where the provisioned amount of DRAM dominates the hardware costs. A way to decrease the memory footprint is to leverage access skew in the workload by moving rarely accessed cold data to cheaper storage layers and retaining only frequently accessed hot data in main memory. In this paper, we present SAHARA, an advisor that proposes a table partitioning for column stores with minimal memory footprint while still adhering to all performance SLAs. SAHARA collects lightweight workload statistics, classifies data as hot and cold, and calculates optimal or near-optimal range partitioning layouts with low optimization time using a novel cost model. We integrated SAHARA into a commercial cloud database and show in our experiments for real-world and synthetic benchmarks a memory footprint reduction of 2.5× while still fulfilling all performance SLAs provided by the customer or advertised by the DBaaS provider.
Cardinality Estimation using Label Probability Propagation for Subgraph Matching in Property Graph Databases
2022, Wörteler, Leonard, Renftle, Moritz, Chondrogiannis, Theodoros, Grossniklaus, Michael
Estimating query result cardinality is a central task of cost-based database query optimizers, enabling them to identify and avoid excessively large intermediate results. While cardinality estimation has been studied extensively in relational databases, research in the setting of graph databases has been more limited. In this paper, we address the problem of cardinality estimation for subgraph matching on property graph databases. Our novel cardinality estimation technique starts from a small amount of statistical information about node labels and relationship types, which is propagated along the graph query pattern in terms of label probabilities. Additionally, estimation quality can be improved by providing information about labels or properties to our technique, if available. In our experimental evaluation, we compare our approach to state-of-the-art cardinality estimation techniques for subgraph matching for property graph, RDF, and relational databases, and we demonstrate that our technique offers the best trade-off between accuracy and efficiency.
Workload aware data partitioning
2022, May, Norman, Boehm, Alexander, Moerkotte, Guido, Brendle, Michael, Valiyev, Mahammad, Weber, Nick, Schulze, Robert, Grossniklaus, Michael
Techniques and solutions are described for partitioning data among different types of computer-readable storage media, such as between RAM and disk-based storage. A measured workload can be used to estimate data access for one or more possible partition arrangements. The partitions arrangements can be automatically enumerated. Scores for the partition arrangements can be calculated, where a score can indicate how efficiently a partition arrangement places frequently accessed data into storage specified for frequently-accessed data and placed infrequently accessed data into storage specified for infrequently accessed data.
Online Landmark-Based Batch Processing of Shortest Path Queries
2021, Hotz, Manuel, Chondrogiannis, Theodoros, Wörteler, Leonard, Grossniklaus, Michael
Processing shortest path queries is a basic operation in many graph problems. Both preprocessing-based and batch processing techniques have been proposed to speed up the computation of a single shortest path by amortizing its costs. However, both of these approaches suffer from limitations. The former techniques are prohibitively expensive in situations where the precomputed information needs to be updated frequently due to changes in the graph, while the latter require coordinates and cannot be used on non-spatial graphs. In this paper, we address both limitations and propose novel techniques for batch processing shortest paths queries using landmarks. We show how preprocessing can be avoided entirely by integrating the computation of landmark distances into query processing. Our experimental results demonstrate that our techniques outperform the state of the art on both spatial and non-spatial graphs with a maximum speedup of 3.61 × in online scenarios.