Publikation:

Communication-less Strategies for Widening

Lade...
Vorschaubild

Dateien

Ivanova-Rohling_2-kw8k2dxtevg26.pdf
Ivanova-Rohling_2-kw8k2dxtevg26.pdfGröße: 2.67 MBDownloads: 575

Datum

2019

Herausgeber:innen

Kontakt

ISSN der Zeitschrift

Electronic ISSN

ISBN

Bibliografische Daten

Verlag

Schriftenreihe

Auflagebezeichnung

DOI (zitierfähiger Link)
ArXiv-ID

Internationale Patentnummer

Angaben zur Forschungsförderung

Projekt

Open Access-Veröffentlichung
Open Access Green
Core Facility der Universität Konstanz

Gesperrt bis

Titel in einer weiteren Sprache

Publikationstyp
Dissertation
Publikationsstatus
Published

Erschienen in

Zusammenfassung

We live in the age of ever-increasing parallel computing resources, and as a consequence, for a decade there has been intense research into parallel data mining algorithms. Most of this research is focused on improving the running time of existing algorithms. In this dissertation we want to look at parallel data mining from a different perspective: instead of using the parallel versions of algorithms to obtain the solution with the same quality, only faster, we want to know how to invest parallel compute resources in a way which improves the quality of the solution. Because the space of potential solutions if typically enormous, and it cannot be explored through an exhaustive search to find the optimal solution, often the data mining algorithms use a heuristic (for example a greedy search) in order to find a sufficiently good solution in a reasonable time. Due to this limited exploration, finding the optimal solution is not guaranteed. The focus of this work is to develop strategies for investing parallel compute resources to improve the result obtained by standard greedy data mining heuristics by investing parallel resources into a better exploration of the (model) search space. We call this approach Widening of search heuristics. Additionally, we want to keep the running time of the algorithm as close to the running time of the original algorithm, as possible.
The most trivial implementation of Widening is the simple beam search, Top-k Widening. This approach, however, requires continuous synchronization between the parallel workers. In order to avoid the consequent communication overhead, we are focused on Widening strategies, which explore the search space in a structured fashion, without the need for communication.We describe Ideal Widening, a theoretical framework for Widening, defined as an explicit partition of the search space with desired properties. Each partition is assigned for exploration to a given parallel worker. However, Ideal Widening is difficult to achieve in practice for a general type of search space. This is why we look at different communication-less Widening approaches, achievable in practice. These are typically based on the implicit partitioning of the search space via individualized modification of the selection operator for each parallel worker. Another problem of the Widening approaches to consider is the danger of converging to a local optimum. This is a well-known problem for the beam-search-based approaches and is solved by enforcing diversity. Achieving diversity of exploration without communication and without prior knowledge of the search space is not trivial. We introduce a simple approach, which assigns individual model preferences to each parallel worker, prior to the beginning of the search. This approach is parameter dependent and does not allow for a structured exploration of the search space, which is the goal of Widening. We define a neighborhood-based approach, Widening via neighborhoods, which aims at a structured exploration of the search space. Widening via neighborhoods defines neighborhoods of the locally optimal model with different properties and partitions them among the parallel workers, using labels assigned prior to the search. Three types of neighborhoods are investigated: optimality neighborhoods, for which the metric is based on the model quality measure; similarity neighborhoods, for which the metric is some type of model-based or data- based distance measure, and diversity neighborhoods, which contain diverse-and-good k models. Different approaches to diverse neighborhoods are described some are based on simple diversity thresholds, others use more sophisticated strategies such as nicheing. We demonstrate the theoretical properties of the neighborhood-based Widening approach, under fixed assumptions about the search space.

Zusammenfassung in einer weiteren Sprache

Fachgebiet (DDC)
004 Informatik

Schlagwörter

Machine Learning, Parallel Machine Learning, Data Mining, improving model quality

Konferenz

Rezension
undefined / . - undefined, undefined

Forschungsvorhaben

Organisationseinheiten

Zeitschriftenheft

Zugehörige Datensätze in KOPS

Zitieren

ISO 690IVANOVA-ROHLING, Violeta, 2019. Communication-less Strategies for Widening [Dissertation]. Konstanz: University of Konstanz
BibTex
@phdthesis{IvanovaRohling2019Commu-46105,
  year={2019},
  title={Communication-less Strategies for Widening},
  author={Ivanova-Rohling, Violeta},
  address={Konstanz},
  school={Universität Konstanz}
}
RDF
<rdf:RDF
    xmlns:dcterms="http://purl.org/dc/terms/"
    xmlns:dc="http://purl.org/dc/elements/1.1/"
    xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"
    xmlns:bibo="http://purl.org/ontology/bibo/"
    xmlns:dspace="http://digital-repositories.org/ontologies/dspace/0.1.0#"
    xmlns:foaf="http://xmlns.com/foaf/0.1/"
    xmlns:void="http://rdfs.org/ns/void#"
    xmlns:xsd="http://www.w3.org/2001/XMLSchema#" > 
  <rdf:Description rdf:about="https://kops.uni-konstanz.de/server/rdf/resource/123456789/46105">
    <dc:rights>terms-of-use</dc:rights>
    <dc:contributor>Ivanova-Rohling, Violeta</dc:contributor>
    <dcterms:hasPart rdf:resource="https://kops.uni-konstanz.de/bitstream/123456789/46105/3/Ivanova-Rohling_2-kw8k2dxtevg26.pdf"/>
    <dspace:isPartOfCollection rdf:resource="https://kops.uni-konstanz.de/server/rdf/resource/123456789/36"/>
    <dspace:hasBitstream rdf:resource="https://kops.uni-konstanz.de/bitstream/123456789/46105/3/Ivanova-Rohling_2-kw8k2dxtevg26.pdf"/>
    <dc:creator>Ivanova-Rohling, Violeta</dc:creator>
    <foaf:homepage rdf:resource="http://localhost:8080/"/>
    <dc:language>eng</dc:language>
    <dcterms:rights rdf:resource="https://rightsstatements.org/page/InC/1.0/"/>
    <void:sparqlEndpoint rdf:resource="http://localhost/fuseki/dspace/sparql"/>
    <dcterms:available rdf:datatype="http://www.w3.org/2001/XMLSchema#dateTime">2019-06-26T05:49:23Z</dcterms:available>
    <dcterms:abstract xml:lang="eng">We live in the age of ever-increasing parallel computing resources, and as a consequence, for a decade there has been intense research into parallel data mining algorithms. Most of this research is focused on improving the running time of existing algorithms. In this dissertation we want to look at parallel data mining from a different perspective: instead of using the parallel versions of algorithms to obtain the solution with the same quality, only faster, we want to know how to invest parallel compute resources in a way which improves the quality of the solution. Because the space of potential solutions if typically enormous, and it cannot be explored through an exhaustive search to find the optimal solution, often the data mining algorithms use a heuristic (for example a greedy search) in order to find a sufficiently good solution in a reasonable time. Due to this limited exploration, finding the optimal solution is not guaranteed. The focus of this work is to develop strategies for investing parallel compute resources to improve the result obtained by standard greedy data mining heuristics by investing parallel resources into a better exploration of the (model) search space. We call this approach Widening of search heuristics. Additionally, we want to keep the running time of the algorithm as close to the running time of the original algorithm, as possible.&lt;br /&gt;The most trivial implementation of Widening is the simple beam search, Top-k Widening. This approach, however, requires continuous synchronization between the parallel workers. In order to avoid the consequent communication overhead, we are focused on Widening strategies, which explore the search space in a structured fashion, without the need for communication.We describe Ideal Widening, a theoretical framework for Widening, defined as an explicit partition of the search space with desired properties. Each partition is assigned for exploration to a given parallel worker. However, Ideal Widening is difficult to achieve in practice for a general type of search space. This is why we look at different communication-less Widening approaches, achievable in practice. These are typically based on the implicit partitioning of the search space via individualized modification of the selection operator for each parallel worker. Another problem of the Widening approaches to consider is the danger of converging to a local optimum. This is a well-known problem for the beam-search-based approaches and is solved by enforcing diversity. Achieving diversity of exploration without communication and without prior knowledge of the search space is not trivial. We introduce a simple approach, which assigns individual model preferences to each parallel worker, prior to the beginning of the search. This approach is parameter dependent and does not allow for a structured exploration of the search space, which is the goal of Widening. We define a neighborhood-based approach, Widening via neighborhoods, which aims at a structured exploration of the search space. Widening via neighborhoods defines neighborhoods of the locally optimal model with different properties and partitions them among the parallel workers, using labels assigned prior to the search. Three types of neighborhoods are investigated: optimality neighborhoods, for which the metric is based on the model quality measure; similarity neighborhoods, for which the metric is some type of model-based or data- based distance measure, and diversity neighborhoods, which contain diverse-and-good k models. Different approaches to diverse neighborhoods are described some are based on simple diversity thresholds, others use more sophisticated strategies such as nicheing. We demonstrate the theoretical properties of the neighborhood-based Widening approach, under fixed assumptions about the search space.</dcterms:abstract>
    <dcterms:issued>2019</dcterms:issued>
    <bibo:uri rdf:resource="https://kops.uni-konstanz.de/handle/123456789/46105"/>
    <dcterms:title>Communication-less Strategies for Widening</dcterms:title>
    <dc:date rdf:datatype="http://www.w3.org/2001/XMLSchema#dateTime">2019-06-26T05:49:23Z</dc:date>
    <dcterms:isPartOf rdf:resource="https://kops.uni-konstanz.de/server/rdf/resource/123456789/36"/>
  </rdf:Description>
</rdf:RDF>

Interner Vermerk

xmlui.Submission.submit.DescribeStep.inputForms.label.kops_note_fromSubmitter

Kontakt
URL der Originalveröffentl.

Prüfdatum der URL

Prüfungsdatum der Dissertation

May 3, 2019
Hochschulschriftenvermerk
Konstanz, Univ., Diss., 2019
Finanzierungsart

Kommentar zur Publikation

Allianzlizenz
Corresponding Authors der Uni Konstanz vorhanden
Internationale Co-Autor:innen
Universitätsbibliographie
Nein
Begutachtet
Diese Publikation teilen