Memory Footprint Reduction of Cloud Databases with Automated Physical Database Design

dc.contributor.authorBrendle, Michael
dc.date.accessioned2023-02-23T08:02:04Z
dc.date.available2023-02-23T08:02:04Z
dc.date.issued2022eng
dc.description.abstractEnterprises increasingly move their application data into the cloud by employing data management offerings of database-as-a-service providers, who specialize in hosting and managing database instances. While being obligated to certain performance commitments stipulated in service-level agreements (SLAs) with the customer, database-as-a-service providers are incentivized to minimize internal costs to enhance profitability. Since database workloads are often skewed and DRAM constitutes the primary driver of hardware costs, internal costs can be reduced by evicting rarely accessed (cold) data to cheaper storage layers. Hence, only frequently accessed (hot) data should remain in DRAM. As most database management systems employ a buffer manager to load and evict data at page granularity from secondary storage to DRAM and vice versa, keeping only disk pages with hot data in DRAM can lead to substantial cost savings. The physical database schema, however, is usually not defined according to the data access pattern. Therefore, cold data may be stored on the same disk page as hot data. As we will elaborate, polluting the buffer pool with cold data wastes DRAM capacities, which offers a largely untapped cost reduction potential to database-as-a-service providers. In this dissertation, we aim at utilizing the buffer manager more efficiently by recommending a physical database schema in which hot disk pages contain mainly hot data. Our approach is based on the observation that, in most cases, rows of a table are accessed either frequently or rarely according to a value range of a specific column of that table. In order to identify hot and cold data, we introduce a statistics collector that gathers accurate workload statistics with low memory and runtime overhead. By exploiting the collected statistics, our table partitioning advisor proposes a range partitioning layout by grouping rows into partitions belonging to hot- or cold-classified value ranges. As a result, hot range partitions with a high density of hot data stay in DRAM, whereas cold range partitions are evicted to cheaper storage layers. This prevents the pollution of the buffer pool with cold data. In addition, we periodically optimize the physical schema in light of workload changes over time. We propose a forward-looking approach by developing a workload predictor, which forecasts the approximate future workload. The predicted workload is then fed into our table partitioning advisor. Finally, we implement our ideas into a prototype of a commercial database and showcase its applicability by incorporating a real-world database workload. Our experimental evaluation demonstrates a buffer pool size reduction of up to 3.2x compared to related approaches while still adhering to SLAs.eng
dc.description.versionpublishedde
dc.identifier.ppn1837351279
dc.identifier.urihttps://kops.uni-konstanz.de/handle/123456789/66195
dc.language.isoengeng
dc.rightsterms-of-use
dc.rights.urihttps://rightsstatements.org/page/InC/1.0/
dc.subjectautomated physical database design, cloud databases, memory footprint reduction, statistics collector, table partitioning advisor, workload predictoreng
dc.subject.ddc004
dc.titleMemory Footprint Reduction of Cloud Databases with Automated Physical Database Designeng
dc.typeDOCTORAL_THESISde
dspace.entity.typePublication
kops.citation.bibtex
@phdthesis{Brendle2022Memor-66195,
  year={2022},
  title={Memory Footprint Reduction of Cloud Databases with Automated Physical Database Design},
  author={Brendle, Michael},
  address={Konstanz},
  school={Universität Konstanz}
}
kops.citation.iso690BRENDLE, Michael, 2022. Memory Footprint Reduction of Cloud Databases with Automated Physical Database Design [Dissertation]. Konstanz: University of Konstanzdeu
kops.citation.iso690BRENDLE, Michael, 2022. Memory Footprint Reduction of Cloud Databases with Automated Physical Database Design [Dissertation]. Konstanz: University of Konstanzeng
kops.citation.rdf
<rdf:RDF
    xmlns:dcterms="http://purl.org/dc/terms/"
    xmlns:dc="http://purl.org/dc/elements/1.1/"
    xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"
    xmlns:bibo="http://purl.org/ontology/bibo/"
    xmlns:dspace="http://digital-repositories.org/ontologies/dspace/0.1.0#"
    xmlns:foaf="http://xmlns.com/foaf/0.1/"
    xmlns:void="http://rdfs.org/ns/void#"
    xmlns:xsd="http://www.w3.org/2001/XMLSchema#" > 
  <rdf:Description rdf:about="https://kops.uni-konstanz.de/server/rdf/resource/123456789/66195">
    <dc:date rdf:datatype="http://www.w3.org/2001/XMLSchema#dateTime">2023-02-23T08:02:04Z</dc:date>
    <dspace:hasBitstream rdf:resource="https://kops.uni-konstanz.de/bitstream/123456789/66195/3/Brendle_2-274v8wucp6uu2.pdf"/>
    <foaf:homepage rdf:resource="http://localhost:8080/"/>
    <dcterms:hasPart rdf:resource="https://kops.uni-konstanz.de/bitstream/123456789/66195/3/Brendle_2-274v8wucp6uu2.pdf"/>
    <dc:creator>Brendle, Michael</dc:creator>
    <dcterms:rights rdf:resource="https://rightsstatements.org/page/InC/1.0/"/>
    <dc:contributor>Brendle, Michael</dc:contributor>
    <dspace:isPartOfCollection rdf:resource="https://kops.uni-konstanz.de/server/rdf/resource/123456789/36"/>
    <void:sparqlEndpoint rdf:resource="http://localhost/fuseki/dspace/sparql"/>
    <dcterms:abstract xml:lang="eng">Enterprises increasingly move their application data into the cloud by employing data management offerings of database-as-a-service providers, who specialize in hosting and managing database instances. While being obligated to certain performance commitments stipulated in service-level agreements (SLAs) with the customer, database-as-a-service providers are incentivized to minimize internal costs to enhance profitability. Since database workloads are often skewed and DRAM constitutes the primary driver of hardware costs, internal costs can be reduced by evicting rarely accessed (cold) data to cheaper storage layers. Hence, only frequently accessed (hot) data should remain in DRAM. As most database management systems employ a buffer manager to load and evict data at page granularity from secondary storage to DRAM and vice versa, keeping only disk pages with hot data in DRAM can lead to substantial cost savings. The physical database schema, however, is usually not defined according to the data access pattern. Therefore, cold data may be stored on the same disk page as hot data. As we will elaborate, polluting the buffer pool with cold data wastes DRAM capacities, which offers a largely untapped cost reduction potential to database-as-a-service providers. In this dissertation, we aim at utilizing the buffer manager more efficiently by recommending a physical database schema in which hot disk pages contain mainly hot data. Our approach is based on the observation that, in most cases, rows of a table are accessed either frequently or rarely according to a value range of a specific column of that table. In order to identify hot and cold data, we introduce a statistics collector that gathers accurate workload statistics with low memory and runtime overhead. By exploiting the collected statistics, our table partitioning advisor proposes a range partitioning layout by grouping rows into partitions belonging to hot- or cold-classified value ranges. As a result, hot range partitions with a high density of hot data stay in DRAM, whereas cold range partitions are evicted to cheaper storage layers. This prevents the pollution of the buffer pool with cold data. In addition, we periodically optimize the physical schema in light of workload changes over time. We propose a forward-looking approach by developing a workload predictor, which forecasts the approximate future workload. The predicted workload is then fed into our table partitioning advisor. Finally, we implement our ideas into a prototype of a commercial database and showcase its applicability by incorporating a real-world database workload. Our experimental evaluation demonstrates a buffer pool size reduction of up to 3.2x compared to related approaches while still adhering to SLAs.</dcterms:abstract>
    <dc:language>eng</dc:language>
    <dcterms:available rdf:datatype="http://www.w3.org/2001/XMLSchema#dateTime">2023-02-23T08:02:04Z</dcterms:available>
    <dcterms:isPartOf rdf:resource="https://kops.uni-konstanz.de/server/rdf/resource/123456789/36"/>
    <dcterms:title>Memory Footprint Reduction of Cloud Databases with Automated Physical Database Design</dcterms:title>
    <dcterms:issued>2022</dcterms:issued>
    <bibo:uri rdf:resource="https://kops.uni-konstanz.de/handle/123456789/66195"/>
    <dc:rights>terms-of-use</dc:rights>
  </rdf:Description>
</rdf:RDF>
kops.date.examination2022-07-29eng
kops.date.yearDegreeGranted2022
kops.description.openAccessopenaccessgreen
kops.flag.knbibliographytrue
kops.identifier.nbnurn:nbn:de:bsz:352-2-274v8wucp6uu2
relation.isAuthorOfPublication28fbf58e-0d8b-47be-8c52-376b2d4d918c
relation.isAuthorOfPublication.latestForDiscovery28fbf58e-0d8b-47be-8c52-376b2d4d918c

Dateien

Originalbündel

Gerade angezeigt 1 - 1 von 1
Vorschaubild nicht verfügbar
Name:
Brendle_2-274v8wucp6uu2.pdf
Größe:
4.78 MB
Format:
Adobe Portable Document Format
Brendle_2-274v8wucp6uu2.pdf
Brendle_2-274v8wucp6uu2.pdfGröße: 4.78 MBDownloads: 382

Lizenzbündel

Gerade angezeigt 1 - 1 von 1
Vorschaubild nicht verfügbar
Name:
license.txt
Größe:
3.96 KB
Format:
Item-specific license agreed upon to submission
Beschreibung:
license.txt
license.txtGröße: 3.96 KBDownloads: 0