Publikation: Visual Integration of Model and Data Spaces in Classification Problems
Dateien
Datum
Autor:innen
Herausgeber:innen
ISSN der Zeitschrift
Electronic ISSN
ISBN
Bibliografische Daten
Verlag
Schriftenreihe
Auflagebezeichnung
URI (zitierfähiger Link)
Internationale Patentnummer
Link zur Lizenz
Angaben zur Forschungsförderung
Projekt
Open Access-Veröffentlichung
Core Facility der Universität Konstanz
Titel in einer weiteren Sprache
Publikationstyp
Publikationsstatus
Erschienen in
Zusammenfassung
In supervised learning, a sub-area of the ubiquitous machine learning (ML) techniques, building models based on the ground truth extracted from a set of training examples defines an implicit dependence between models and data. Among the supervised learning techniques, this thesis focuses on classification problems. Given a set of known categories (classes), classification is the process of identifying to which class a new observation belongs.
The iterative nature of ML processes poses challenges in tracking, for instance, to which extent a classification model still works when the data change over time. It also makes non-trivial the task of following how data-classification changes during the model-building process, given the multitude of possible model parameterizations or even the combination of models in ensembles. In both examples, data visualization and interaction have the power to complement what numerical methods in isolation offer to model developers and ML practitioners.
In this doctoral thesis, the connecting point aggregating all the content is the visual integration of model and data spaces in classification problems. This thesis proposes categorizing model-data (M:N) relationships based on the number of models and data subsets at each side of that relationship. The proposed model-data relationships support the analysis of particular application scenarios.
The thesis has two main parts. The first part is about visual model comparison and visual model building of classifier ensembles. These application cases fit in the M:1 relationship, in which there are several model candidates (M) in the model space and one data subset in the data space. In visual model building, the research goal was to investigate how to integrate data and model space to enable visual analysis of classification results in terms of errors in ensemble learning. In visual model comparison, the customization of a data projection algorithm enabled a specialist's involvement in facilitating the comparison of classification landscapes produced by distinct models through anchor-points selection in data.
The second part is about visualizing the dataset shift problem in classification. This application case fits the 1:N relationship, in which there is one single model to classify several (N) data subsets. Statistics on data change, in isolation, are not enough to foresee the impacts of those changes in model performance, while data labels are not available yet. Inspired by Anscombe's quartet, in which visualization reveals very different data distributions of four sets with almost identical descriptive statistics, two experiments with linearly separable data are presented. For similar data changing stats, visualization reveals opposite impacts on the model's ability to classify data correctly.
Lastly, this thesis highlights the gaps in dataset-shift research from the experiments and related work. It proposes how to accomplish the persistent visual monitoring of model and data compatibility in supervised learning.
Zusammenfassung in einer weiteren Sprache
Fachgebiet (DDC)
Schlagwörter
Konferenz
Rezension
Zitieren
ISO 690
SCHNEIDER, Bruno, 2023. Visual Integration of Model and Data Spaces in Classification Problems [Dissertation]. Konstanz: University of KonstanzBibTex
@phdthesis{Schneider2023-10-20Visua-67959, year={2023}, title={Visual Integration of Model and Data Spaces in Classification Problems}, url={https://bib.dbvis.de/publications/view/1031}, author={Schneider, Bruno}, address={Konstanz}, school={Universität Konstanz} }
RDF
<rdf:RDF xmlns:dcterms="http://purl.org/dc/terms/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:bibo="http://purl.org/ontology/bibo/" xmlns:dspace="http://digital-repositories.org/ontologies/dspace/0.1.0#" xmlns:foaf="http://xmlns.com/foaf/0.1/" xmlns:void="http://rdfs.org/ns/void#" xmlns:xsd="http://www.w3.org/2001/XMLSchema#" > <rdf:Description rdf:about="https://kops.uni-konstanz.de/server/rdf/resource/123456789/67959"> <dc:date rdf:datatype="http://www.w3.org/2001/XMLSchema#dateTime">2023-10-20T11:08:51Z</dc:date> <void:sparqlEndpoint rdf:resource="http://localhost/fuseki/dspace/sparql"/> <dcterms:hasPart rdf:resource="https://kops.uni-konstanz.de/bitstream/123456789/67959/4/Schneider_Bruno-2-75pdiwhvyvqz6.pdf"/> <foaf:homepage rdf:resource="http://localhost:8080/"/> <dspace:isPartOfCollection rdf:resource="https://kops.uni-konstanz.de/server/rdf/resource/123456789/36"/> <dc:creator>Schneider, Bruno</dc:creator> <dcterms:issued>2023-10-20</dcterms:issued> <dc:language>eng</dc:language> <dcterms:abstract>In supervised learning, a sub-area of the ubiquitous machine learning (ML) techniques, building models based on the ground truth extracted from a set of training examples defines an implicit dependence between models and data. Among the supervised learning techniques, this thesis focuses on classification problems. Given a set of known categories (classes), classification is the process of identifying to which class a new observation belongs. The iterative nature of ML processes poses challenges in tracking, for instance, to which extent a classification model still works when the data change over time. It also makes non-trivial the task of following how data-classification changes during the model-building process, given the multitude of possible model parameterizations or even the combination of models in ensembles. In both examples, data visualization and interaction have the power to complement what numerical methods in isolation offer to model developers and ML practitioners. In this doctoral thesis, the connecting point aggregating all the content is the visual integration of model and data spaces in classification problems. This thesis proposes categorizing model-data (M:N) relationships based on the number of models and data subsets at each side of that relationship. The proposed model-data relationships support the analysis of particular application scenarios. The thesis has two main parts. The first part is about visual model comparison and visual model building of classifier ensembles. These application cases fit in the M:1 relationship, in which there are several model candidates (M) in the model space and one data subset in the data space. In visual model building, the research goal was to investigate how to integrate data and model space to enable visual analysis of classification results in terms of errors in ensemble learning. In visual model comparison, the customization of a data projection algorithm enabled a specialist's involvement in facilitating the comparison of classification landscapes produced by distinct models through anchor-points selection in data. The second part is about visualizing the dataset shift problem in classification. This application case fits the 1:N relationship, in which there is one single model to classify several (N) data subsets. Statistics on data change, in isolation, are not enough to foresee the impacts of those changes in model performance, while data labels are not available yet. Inspired by Anscombe's quartet, in which visualization reveals very different data distributions of four sets with almost identical descriptive statistics, two experiments with linearly separable data are presented. For similar data changing stats, visualization reveals opposite impacts on the model's ability to classify data correctly. Lastly, this thesis highlights the gaps in dataset-shift research from the experiments and related work. It proposes how to accomplish the persistent visual monitoring of model and data compatibility in supervised learning.</dcterms:abstract> <dc:contributor>Schneider, Bruno</dc:contributor> <dspace:hasBitstream rdf:resource="https://kops.uni-konstanz.de/bitstream/123456789/67959/4/Schneider_Bruno-2-75pdiwhvyvqz6.pdf"/> <dcterms:rights rdf:resource="http://creativecommons.org/publicdomain/zero/1.0/"/> <dc:rights>CC0 1.0 Universal</dc:rights> <dcterms:isPartOf rdf:resource="https://kops.uni-konstanz.de/server/rdf/resource/123456789/36"/> <dcterms:available rdf:datatype="http://www.w3.org/2001/XMLSchema#dateTime">2023-10-20T11:08:51Z</dcterms:available> <bibo:uri rdf:resource="https://kops.uni-konstanz.de/handle/123456789/67959"/> <dcterms:title>Visual Integration of Model and Data Spaces in Classification Problems</dcterms:title> </rdf:Description> </rdf:RDF>