Workload aware data partitioning
2022, May, Norman, Boehm, Alexander, Moerkotte, Guido, Brendle, Michael, Valiyev, Mahammad, Weber, Nick, Schulze, Robert, Grossniklaus, Michael
Techniques and solutions are described for partitioning data among different types of computer-readable storage media, such as between RAM and disk-based storage. A measured workload can be used to estimate data access for one or more possible partition arrangements. The partitions arrangements can be automatically enumerated. Scores for the partition arrangements can be calculated, where a score can indicate how efficiently a partition arrangement places frequently accessed data into storage specified for frequently-accessed data and placed infrequently accessed data into storage specified for infrequently accessed data.
SAHARA : Memory Footprint Reduction of Cloud Databases with Automated Table Partitioning
2022, Brendle, Michael, Weber, Nick, Valiyev, Mahammad, May, Norman, Schulze, Robert, Böhm, Alexander, Moerkotte, Guido, Grossniklaus, Michael
Enterprises increasingly move their databases into the cloud. As a result, database-as-a-service providers are challenged to meet the performance guarantees assured in service-level agreements (SLAs) while keeping hardware costs as low as possible. Being cost-effective is particularly crucial for cloud databases where the provisioned amount of DRAM dominates the hardware costs. A way to decrease the memory footprint is to leverage access skew in the workload by moving rarely accessed cold data to cheaper storage layers and retaining only frequently accessed hot data in main memory. In this paper, we present SAHARA, an advisor that proposes a table partitioning for column stores with minimal memory footprint while still adhering to all performance SLAs. SAHARA collects lightweight workload statistics, classifies data as hot and cold, and calculates optimal or near-optimal range partitioning layouts with low optimization time using a novel cost model. We integrated SAHARA into a commercial cloud database and show in our experiments for real-world and synthetic benchmarks a memory footprint reduction of 2.5× while still fulfilling all performance SLAs provided by the customer or advertised by the DBaaS provider.