Communication-less Strategies for Widening
Communication-less Strategies for Widening
Loading...
Date
2019
Authors
Editors
Journal ISSN
Electronic ISSN
ISBN
Bibliographical data
Publisher
Series
URI (citable link)
International patent number
Link to the license
EU project number
Project
Open Access publication
Collections
Title in another language
Publication type
Dissertation
Publication status
Published
Published in
Abstract
We live in the age of ever-increasing parallel computing resources, and as a consequence, for a decade there has been intense research into parallel data mining algorithms. Most of this research is focused on improving the running time of existing algorithms. In this dissertation we want to look at parallel data mining from a different perspective: instead of using the parallel versions of algorithms to obtain the solution with the same quality, only faster, we want to know how to invest parallel compute resources in a way which improves the quality of the solution. Because the space of potential solutions if typically enormous, and it cannot be explored through an exhaustive search to find the optimal solution, often the data mining algorithms use a heuristic (for example a greedy search) in order to find a sufficiently good solution in a reasonable time. Due to this limited exploration, finding the optimal solution is not guaranteed. The focus of this work is to develop strategies for investing parallel compute resources to improve the result obtained by standard greedy data mining heuristics by investing parallel resources into a better exploration of the (model) search space. We call this approach Widening of search heuristics. Additionally, we want to keep the running time of the algorithm as close to the running time of the original algorithm, as possible.
The most trivial implementation of Widening is the simple beam search, Top-k Widening. This approach, however, requires continuous synchronization between the parallel workers. In order to avoid the consequent communication overhead, we are focused on Widening strategies, which explore the search space in a structured fashion, without the need for communication.We describe Ideal Widening, a theoretical framework for Widening, defined as an explicit partition of the search space with desired properties. Each partition is assigned for exploration to a given parallel worker. However, Ideal Widening is difficult to achieve in practice for a general type of search space. This is why we look at different communication-less Widening approaches, achievable in practice. These are typically based on the implicit partitioning of the search space via individualized modification of the selection operator for each parallel worker. Another problem of the Widening approaches to consider is the danger of converging to a local optimum. This is a well-known problem for the beam-search-based approaches and is solved by enforcing diversity. Achieving diversity of exploration without communication and without prior knowledge of the search space is not trivial. We introduce a simple approach, which assigns individual model preferences to each parallel worker, prior to the beginning of the search. This approach is parameter dependent and does not allow for a structured exploration of the search space, which is the goal of Widening. We define a neighborhood-based approach, Widening via neighborhoods, which aims at a structured exploration of the search space. Widening via neighborhoods defines neighborhoods of the locally optimal model with different properties and partitions them among the parallel workers, using labels assigned prior to the search. Three types of neighborhoods are investigated: optimality neighborhoods, for which the metric is based on the model quality measure; similarity neighborhoods, for which the metric is some type of model-based or data- based distance measure, and diversity neighborhoods, which contain diverse-and-good k models. Different approaches to diverse neighborhoods are described some are based on simple diversity thresholds, others use more sophisticated strategies such as nicheing. We demonstrate the theoretical properties of the neighborhood-based Widening approach, under fixed assumptions about the search space.
The most trivial implementation of Widening is the simple beam search, Top-k Widening. This approach, however, requires continuous synchronization between the parallel workers. In order to avoid the consequent communication overhead, we are focused on Widening strategies, which explore the search space in a structured fashion, without the need for communication.We describe Ideal Widening, a theoretical framework for Widening, defined as an explicit partition of the search space with desired properties. Each partition is assigned for exploration to a given parallel worker. However, Ideal Widening is difficult to achieve in practice for a general type of search space. This is why we look at different communication-less Widening approaches, achievable in practice. These are typically based on the implicit partitioning of the search space via individualized modification of the selection operator for each parallel worker. Another problem of the Widening approaches to consider is the danger of converging to a local optimum. This is a well-known problem for the beam-search-based approaches and is solved by enforcing diversity. Achieving diversity of exploration without communication and without prior knowledge of the search space is not trivial. We introduce a simple approach, which assigns individual model preferences to each parallel worker, prior to the beginning of the search. This approach is parameter dependent and does not allow for a structured exploration of the search space, which is the goal of Widening. We define a neighborhood-based approach, Widening via neighborhoods, which aims at a structured exploration of the search space. Widening via neighborhoods defines neighborhoods of the locally optimal model with different properties and partitions them among the parallel workers, using labels assigned prior to the search. Three types of neighborhoods are investigated: optimality neighborhoods, for which the metric is based on the model quality measure; similarity neighborhoods, for which the metric is some type of model-based or data- based distance measure, and diversity neighborhoods, which contain diverse-and-good k models. Different approaches to diverse neighborhoods are described some are based on simple diversity thresholds, others use more sophisticated strategies such as nicheing. We demonstrate the theoretical properties of the neighborhood-based Widening approach, under fixed assumptions about the search space.
Summary in another language
Subject (DDC)
004 Computer Science
Keywords
Machine Learning, Parallel Machine Learning, Data Mining, improving model quality
Conference
Review
undefined / . - undefined, undefined. - (undefined; undefined)
Cite This
ISO 690
IVANOVA, Violeta, 2019. Communication-less Strategies for Widening [Dissertation]. Konstanz: University of KonstanzBibTex
@phdthesis{Ivanova2019Commu-46105, year={2019}, title={Communication-less Strategies for Widening}, author={Ivanova, Violeta}, address={Konstanz}, school={Universität Konstanz} }
RDF
<rdf:RDF xmlns:dcterms="http://purl.org/dc/terms/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:bibo="http://purl.org/ontology/bibo/" xmlns:dspace="http://digital-repositories.org/ontologies/dspace/0.1.0#" xmlns:foaf="http://xmlns.com/foaf/0.1/" xmlns:void="http://rdfs.org/ns/void#" xmlns:xsd="http://www.w3.org/2001/XMLSchema#" > <rdf:Description rdf:about="https://kops.uni-konstanz.de/server/rdf/resource/123456789/46105"> <dc:rights>terms-of-use</dc:rights> <dcterms:hasPart rdf:resource="https://kops.uni-konstanz.de/bitstream/123456789/46105/3/Ivanova-Rohling_2-kw8k2dxtevg26.pdf"/> <dspace:isPartOfCollection rdf:resource="https://kops.uni-konstanz.de/server/rdf/resource/123456789/36"/> <dspace:hasBitstream rdf:resource="https://kops.uni-konstanz.de/bitstream/123456789/46105/3/Ivanova-Rohling_2-kw8k2dxtevg26.pdf"/> <foaf:homepage rdf:resource="http://localhost:8080/"/> <dc:creator>Ivanova, Violeta</dc:creator> <dc:language>eng</dc:language> <dcterms:rights rdf:resource="https://rightsstatements.org/page/InC/1.0/"/> <void:sparqlEndpoint rdf:resource="http://localhost/fuseki/dspace/sparql"/> <dcterms:available rdf:datatype="http://www.w3.org/2001/XMLSchema#dateTime">2019-06-26T05:49:23Z</dcterms:available> <dcterms:abstract xml:lang="eng">We live in the age of ever-increasing parallel computing resources, and as a consequence, for a decade there has been intense research into parallel data mining algorithms. Most of this research is focused on improving the running time of existing algorithms. In this dissertation we want to look at parallel data mining from a different perspective: instead of using the parallel versions of algorithms to obtain the solution with the same quality, only faster, we want to know how to invest parallel compute resources in a way which improves the quality of the solution. Because the space of potential solutions if typically enormous, and it cannot be explored through an exhaustive search to find the optimal solution, often the data mining algorithms use a heuristic (for example a greedy search) in order to find a sufficiently good solution in a reasonable time. Due to this limited exploration, finding the optimal solution is not guaranteed. The focus of this work is to develop strategies for investing parallel compute resources to improve the result obtained by standard greedy data mining heuristics by investing parallel resources into a better exploration of the (model) search space. We call this approach Widening of search heuristics. Additionally, we want to keep the running time of the algorithm as close to the running time of the original algorithm, as possible.<br />The most trivial implementation of Widening is the simple beam search, Top-k Widening. This approach, however, requires continuous synchronization between the parallel workers. In order to avoid the consequent communication overhead, we are focused on Widening strategies, which explore the search space in a structured fashion, without the need for communication.We describe Ideal Widening, a theoretical framework for Widening, defined as an explicit partition of the search space with desired properties. Each partition is assigned for exploration to a given parallel worker. However, Ideal Widening is difficult to achieve in practice for a general type of search space. This is why we look at different communication-less Widening approaches, achievable in practice. These are typically based on the implicit partitioning of the search space via individualized modification of the selection operator for each parallel worker. Another problem of the Widening approaches to consider is the danger of converging to a local optimum. This is a well-known problem for the beam-search-based approaches and is solved by enforcing diversity. Achieving diversity of exploration without communication and without prior knowledge of the search space is not trivial. We introduce a simple approach, which assigns individual model preferences to each parallel worker, prior to the beginning of the search. This approach is parameter dependent and does not allow for a structured exploration of the search space, which is the goal of Widening. We define a neighborhood-based approach, Widening via neighborhoods, which aims at a structured exploration of the search space. Widening via neighborhoods defines neighborhoods of the locally optimal model with different properties and partitions them among the parallel workers, using labels assigned prior to the search. Three types of neighborhoods are investigated: optimality neighborhoods, for which the metric is based on the model quality measure; similarity neighborhoods, for which the metric is some type of model-based or data- based distance measure, and diversity neighborhoods, which contain diverse-and-good k models. Different approaches to diverse neighborhoods are described some are based on simple diversity thresholds, others use more sophisticated strategies such as nicheing. We demonstrate the theoretical properties of the neighborhood-based Widening approach, under fixed assumptions about the search space.</dcterms:abstract> <dcterms:issued>2019</dcterms:issued> <bibo:uri rdf:resource="https://kops.uni-konstanz.de/handle/123456789/46105"/> <dcterms:title>Communication-less Strategies for Widening</dcterms:title> <dc:contributor>Ivanova, Violeta</dc:contributor> <dc:date rdf:datatype="http://www.w3.org/2001/XMLSchema#dateTime">2019-06-26T05:49:23Z</dc:date> <dcterms:isPartOf rdf:resource="https://kops.uni-konstanz.de/server/rdf/resource/123456789/36"/> </rdf:Description> </rdf:RDF>
Internal note
xmlui.Submission.submit.DescribeStep.inputForms.label.kops_note_fromSubmitter
Examination date of dissertation
May 3, 2019
University note
Konstanz, Univ., Doctoral dissertation, 2019
Method of financing
Comment on publication
Alliance license
Corresponding Authors der Uni Konstanz vorhanden
International Co-Authors
Bibliography of Konstanz
No