Publikation:

Visual Integration of Model and Data Spaces in Classification Problems

Lade...
Vorschaubild

Dateien

Schneider_Bruno-2-75pdiwhvyvqz6.pdf
Schneider_Bruno-2-75pdiwhvyvqz6.pdfGröße: 9.43 MBDownloads: 65

Datum

2023

Autor:innen

Herausgeber:innen

Kontakt

ISSN der Zeitschrift

Electronic ISSN

ISBN

Bibliografische Daten

Verlag

Schriftenreihe

Auflagebezeichnung

DOI (zitierfähiger Link)
ArXiv-ID

Internationale Patentnummer

Link zur Lizenz

Angaben zur Forschungsförderung

Projekt

Open Access-Veröffentlichung
Open Access Green
Core Facility der Universität Konstanz

Gesperrt bis

Titel in einer weiteren Sprache

Publikationstyp
Dissertation
Publikationsstatus
Published

Erschienen in

Zusammenfassung

In supervised learning, a sub-area of the ubiquitous machine learning (ML) techniques, building models based on the ground truth extracted from a set of training examples defines an implicit dependence between models and data. Among the supervised learning techniques, this thesis focuses on classification problems. Given a set of known categories (classes), classification is the process of identifying to which class a new observation belongs.

The iterative nature of ML processes poses challenges in tracking, for instance, to which extent a classification model still works when the data change over time. It also makes non-trivial the task of following how data-classification changes during the model-building process, given the multitude of possible model parameterizations or even the combination of models in ensembles. In both examples, data visualization and interaction have the power to complement what numerical methods in isolation offer to model developers and ML practitioners.

In this doctoral thesis, the connecting point aggregating all the content is the visual integration of model and data spaces in classification problems. This thesis proposes categorizing model-data (M:N) relationships based on the number of models and data subsets at each side of that relationship. The proposed model-data relationships support the analysis of particular application scenarios.

The thesis has two main parts. The first part is about visual model comparison and visual model building of classifier ensembles. These application cases fit in the M:1 relationship, in which there are several model candidates (M) in the model space and one data subset in the data space. In visual model building, the research goal was to investigate how to integrate data and model space to enable visual analysis of classification results in terms of errors in ensemble learning. In visual model comparison, the customization of a data projection algorithm enabled a specialist's involvement in facilitating the comparison of classification landscapes produced by distinct models through anchor-points selection in data.

The second part is about visualizing the dataset shift problem in classification. This application case fits the 1:N relationship, in which there is one single model to classify several (N) data subsets. Statistics on data change, in isolation, are not enough to foresee the impacts of those changes in model performance, while data labels are not available yet. Inspired by Anscombe's quartet, in which visualization reveals very different data distributions of four sets with almost identical descriptive statistics, two experiments with linearly separable data are presented. For similar data changing stats, visualization reveals opposite impacts on the model's ability to classify data correctly.

Lastly, this thesis highlights the gaps in dataset-shift research from the experiments and related work. It proposes how to accomplish the persistent visual monitoring of model and data compatibility in supervised learning.

Zusammenfassung in einer weiteren Sprache

Fachgebiet (DDC)
004 Informatik

Schlagwörter

Konferenz

Rezension
undefined / . - undefined, undefined

Forschungsvorhaben

Organisationseinheiten

Zeitschriftenheft

Verknüpfte Datensätze

Zitieren

ISO 690SCHNEIDER, Bruno, 2023. Visual Integration of Model and Data Spaces in Classification Problems [Dissertation]. Konstanz: University of Konstanz
BibTex
@phdthesis{Schneider2023-10-20Visua-67959,
  year={2023},
  title={Visual Integration of Model and Data Spaces in Classification Problems},
  url={https://bib.dbvis.de/publications/view/1031},
  author={Schneider, Bruno},
  address={Konstanz},
  school={Universität Konstanz}
}
RDF
<rdf:RDF
    xmlns:dcterms="http://purl.org/dc/terms/"
    xmlns:dc="http://purl.org/dc/elements/1.1/"
    xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"
    xmlns:bibo="http://purl.org/ontology/bibo/"
    xmlns:dspace="http://digital-repositories.org/ontologies/dspace/0.1.0#"
    xmlns:foaf="http://xmlns.com/foaf/0.1/"
    xmlns:void="http://rdfs.org/ns/void#"
    xmlns:xsd="http://www.w3.org/2001/XMLSchema#" > 
  <rdf:Description rdf:about="https://kops.uni-konstanz.de/server/rdf/resource/123456789/67959">
    <dc:date rdf:datatype="http://www.w3.org/2001/XMLSchema#dateTime">2023-10-20T11:08:51Z</dc:date>
    <void:sparqlEndpoint rdf:resource="http://localhost/fuseki/dspace/sparql"/>
    <dcterms:hasPart rdf:resource="https://kops.uni-konstanz.de/bitstream/123456789/67959/4/Schneider_Bruno-2-75pdiwhvyvqz6.pdf"/>
    <foaf:homepage rdf:resource="http://localhost:8080/"/>
    <dspace:isPartOfCollection rdf:resource="https://kops.uni-konstanz.de/server/rdf/resource/123456789/36"/>
    <dc:creator>Schneider, Bruno</dc:creator>
    <dcterms:issued>2023-10-20</dcterms:issued>
    <dc:language>eng</dc:language>
    <dcterms:abstract>In supervised learning, a sub-area of the ubiquitous machine learning (ML) techniques, building models based on the ground truth extracted from a set of training examples defines an implicit dependence between models and data. Among the supervised learning techniques, this thesis focuses on classification problems. Given a set of known categories (classes), classification is the process of identifying to which class a new observation belongs. 

The iterative nature of ML processes poses challenges in tracking, for instance, to which extent a classification model still works when the data change over time. It also makes non-trivial the task of following how data-classification changes during the model-building process, given the multitude of possible model parameterizations or even the combination of models in ensembles. In both examples, data visualization and interaction have the power to complement what numerical methods in isolation offer to model developers and ML practitioners. 

In this doctoral thesis, the connecting point aggregating all the content is the visual integration of model and data spaces in classification problems. This thesis proposes categorizing model-data (M:N) relationships based on the number of models and data subsets at each side of that relationship. The proposed model-data relationships support the analysis of particular application scenarios. 

The thesis has two main parts. The first part is about visual model comparison and visual model building of classifier ensembles. These application cases fit in the M:1 relationship, in which there are several model candidates (M) in the model space and one data subset in the data space. In visual model building, the research goal was to investigate how to integrate data and model space to enable visual analysis of classification results in terms of errors in ensemble learning. In visual model comparison, the customization of a data projection algorithm enabled a specialist's involvement in facilitating the comparison of classification landscapes produced by distinct models through anchor-points selection in data. 

The second part is about visualizing the dataset shift problem in classification. This application case fits the 1:N relationship, in which there is one single model to classify several (N) data subsets. Statistics on data change, in isolation, are not enough to foresee the impacts of those changes in model performance, while data labels are not available yet. Inspired by Anscombe's quartet, in which visualization reveals very different data distributions of four sets with almost identical descriptive statistics, two experiments with linearly separable data are presented. For similar data changing stats, visualization reveals opposite impacts on the model's ability to classify data correctly. 

Lastly, this thesis highlights the gaps in dataset-shift research from the experiments and related work. It proposes how to accomplish the persistent visual monitoring of model and data compatibility in supervised learning.</dcterms:abstract>
    <dc:contributor>Schneider, Bruno</dc:contributor>
    <dspace:hasBitstream rdf:resource="https://kops.uni-konstanz.de/bitstream/123456789/67959/4/Schneider_Bruno-2-75pdiwhvyvqz6.pdf"/>
    <dcterms:rights rdf:resource="http://creativecommons.org/publicdomain/zero/1.0/"/>
    <dc:rights>CC0 1.0 Universal</dc:rights>
    <dcterms:isPartOf rdf:resource="https://kops.uni-konstanz.de/server/rdf/resource/123456789/36"/>
    <dcterms:available rdf:datatype="http://www.w3.org/2001/XMLSchema#dateTime">2023-10-20T11:08:51Z</dcterms:available>
    <bibo:uri rdf:resource="https://kops.uni-konstanz.de/handle/123456789/67959"/>
    <dcterms:title>Visual Integration of Model and Data Spaces in Classification Problems</dcterms:title>
  </rdf:Description>
</rdf:RDF>

Interner Vermerk

xmlui.Submission.submit.DescribeStep.inputForms.label.kops_note_fromSubmitter

Kontakt

Prüfdatum der URL

2023-10-20

Prüfungsdatum der Dissertation

April 27, 2023
Hochschulschriftenvermerk
Konstanz, Univ., Diss., 2023
Finanzierungsart

Kommentar zur Publikation

Allianzlizenz
Corresponding Authors der Uni Konstanz vorhanden
Internationale Co-Autor:innen
Universitätsbibliographie
Begutachtet
Diese Publikation teilen