Publikation:

The Categorical Data Map : A Multidimensional Scaling-Based Approach

Lade...
Vorschaubild

Dateien

Zu diesem Dokument gibt es keine Dateien.

Datum

2024

Autor:innen

Paetzold, Patrick
Blumberg, Daniela
Deussen, Oliver
Keim, Daniel A.
Fischer, Maximilian T.

Herausgeber:innen

Kontakt

ISSN der Zeitschrift

Electronic ISSN

ISBN

Bibliografische Daten

Verlag

Schriftenreihe

Auflagebezeichnung

URI (zitierfähiger Link)
ArXiv-ID

Internationale Patentnummer

Angaben zur Forschungsförderung

Projekt

Open Access-Veröffentlichung
Core Facility der Universität Konstanz

Gesperrt bis

Titel in einer weiteren Sprache

Publikationstyp
Beitrag zu einem Konferenzband
Publikationsstatus
Published

Erschienen in

2024 IEEE Visualization in Data Science, VDS 2024, Proceedings. Piscataway, NJ: IEEE, 2024, S. 25-34. ISBN 979-8-3315-2843-0. Verfügbar unter: doi: 10.1109/vds63897.2024.00008

Zusammenfassung

Categorical data does not have an intrinsic definition of distance or order, and therefore, established visualization techniques for categorical data only allow for a set-based or frequency-based analysis, e.g., through Euler diagrams or Parallel Sets, and do not support a similarity-based analysis. We present a novel dimensionality reduction-based visualization for categorical data, which is based on defining the distance of two data items as the number of varying attributes. Our technique enables users to pre-attentively detect groups of similar data items and observe the properties of the projection, such as attributes strongly influencing the embedding. Our prototype visually encodes data properties in an enhanced scatterplot-like visualization, visualizing attributes in the background to show the distribution of categories. In addition, we propose two graph-based measures to quantify the plot’s visual quality, which rank attributes according to their contribution to cluster cohesion. To demonstrate the capabilities of our similarity-based projection method, we compare it to Euler diagrams and Parallel Sets regarding visual scalability and evaluate it quantitatively on seven real-world datasets using a range of common quality measures. Further, we validate the benefits of our approach through an expert study with five data scientists analyzing the Titanic and Mushroom dataset with up to 23 attributes and 8124 category combinations. Our results indicate that our Categorical Data Map offers an effective analysis method for large datasets with a high number of category combinations.

Zusammenfassung in einer weiteren Sprache

Fachgebiet (DDC)
004 Informatik

Schlagwörter

Konferenz

VDS 2024: Visualization in Data Science, 14. Okt. 2024, St. Pete Beach, FL, USA
Rezension
undefined / . - undefined, undefined

Forschungsvorhaben

Organisationseinheiten

Zeitschriftenheft

Zugehörige Datensätze in KOPS

Datensatz
The Categorical Data Map : Replication Data
(2024) Dennig, Frederik L.; Joos, Lucas; Paetzold, Patrick; Blumberg, Daniela; Deussen, Oliver; Keim, Daniel A.; Fischer, Maximilian T.

Zitieren

ISO 690DENNIG, Frederik L., Lucas JOOS, Patrick PAETZOLD, Daniela BLUMBERG, Oliver DEUSSEN, Daniel A. KEIM, Maximilian T. FISCHER, 2024. The Categorical Data Map : A Multidimensional Scaling-Based Approach. VDS 2024: Visualization in Data Science. St. Pete Beach, FL, USA, 14. Okt. 2024. In: 2024 IEEE Visualization in Data Science, VDS 2024, Proceedings. Piscataway, NJ: IEEE, 2024, S. 25-34. ISBN 979-8-3315-2843-0. Verfügbar unter: doi: 10.1109/vds63897.2024.00008
BibTex
@inproceedings{Dennig2024-10-14Categ-71937,
  title={The Categorical Data Map : A Multidimensional Scaling-Based Approach},
  year={2024},
  doi={10.1109/vds63897.2024.00008},
  isbn={979-8-3315-2843-0},
  address={Piscataway, NJ},
  publisher={IEEE},
  booktitle={2024 IEEE Visualization in Data Science, VDS 2024, Proceedings},
  pages={25--34},
  author={Dennig, Frederik L. and Joos, Lucas and Paetzold, Patrick and Blumberg, Daniela and Deussen, Oliver and Keim, Daniel A. and Fischer, Maximilian T.}
}
RDF
<rdf:RDF
    xmlns:dcterms="http://purl.org/dc/terms/"
    xmlns:dc="http://purl.org/dc/elements/1.1/"
    xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"
    xmlns:bibo="http://purl.org/ontology/bibo/"
    xmlns:dspace="http://digital-repositories.org/ontologies/dspace/0.1.0#"
    xmlns:foaf="http://xmlns.com/foaf/0.1/"
    xmlns:void="http://rdfs.org/ns/void#"
    xmlns:xsd="http://www.w3.org/2001/XMLSchema#" > 
  <rdf:Description rdf:about="https://kops.uni-konstanz.de/server/rdf/resource/123456789/71937">
    <dc:contributor>Blumberg, Daniela</dc:contributor>
    <dc:creator>Keim, Daniel A.</dc:creator>
    <dc:contributor>Keim, Daniel A.</dc:contributor>
    <dc:creator>Dennig, Frederik L.</dc:creator>
    <dc:contributor>Joos, Lucas</dc:contributor>
    <dcterms:title>The Categorical Data Map : A Multidimensional Scaling-Based Approach</dcterms:title>
    <dc:contributor>Paetzold, Patrick</dc:contributor>
    <dcterms:issued>2024-10-14</dcterms:issued>
    <dspace:isPartOfCollection rdf:resource="https://kops.uni-konstanz.de/server/rdf/resource/123456789/36"/>
    <dc:creator>Fischer, Maximilian T.</dc:creator>
    <dc:contributor>Fischer, Maximilian T.</dc:contributor>
    <dc:creator>Paetzold, Patrick</dc:creator>
    <void:sparqlEndpoint rdf:resource="http://localhost/fuseki/dspace/sparql"/>
    <bibo:uri rdf:resource="https://kops.uni-konstanz.de/handle/123456789/71937"/>
    <dc:creator>Deussen, Oliver</dc:creator>
    <dcterms:available rdf:datatype="http://www.w3.org/2001/XMLSchema#dateTime">2025-01-16T14:15:47Z</dcterms:available>
    <dcterms:isPartOf rdf:resource="https://kops.uni-konstanz.de/server/rdf/resource/123456789/36"/>
    <foaf:homepage rdf:resource="http://localhost:8080/"/>
    <dcterms:abstract>Categorical data does not have an intrinsic definition of distance or order, and therefore, established visualization techniques for categorical data only allow for a set-based or frequency-based analysis, e.g., through Euler diagrams or Parallel Sets, and do not support a similarity-based analysis. We present a novel dimensionality reduction-based visualization for categorical data, which is based on defining the distance of two data items as the number of varying attributes. Our technique enables users to pre-attentively detect groups of similar data items and observe the properties of the projection, such as attributes strongly influencing the embedding. Our prototype visually encodes data properties in an enhanced scatterplot-like visualization, visualizing attributes in the background to show the distribution of categories. In addition, we propose two graph-based measures to quantify the plot’s visual quality, which rank attributes according to their contribution to cluster cohesion. To demonstrate the capabilities of our similarity-based projection method, we compare it to Euler diagrams and Parallel Sets regarding visual scalability and evaluate it quantitatively on seven real-world datasets using a range of common quality measures. Further, we validate the benefits of our approach through an expert study with five data scientists analyzing the Titanic and Mushroom dataset with up to 23 attributes and 8124 category combinations. Our results indicate that our Categorical Data Map offers an effective analysis method for large datasets with a high number of category combinations.</dcterms:abstract>
    <dc:language>eng</dc:language>
    <dc:contributor>Dennig, Frederik L.</dc:contributor>
    <dc:contributor>Deussen, Oliver</dc:contributor>
    <dc:creator>Joos, Lucas</dc:creator>
    <dc:date rdf:datatype="http://www.w3.org/2001/XMLSchema#dateTime">2025-01-16T14:15:47Z</dc:date>
    <dc:creator>Blumberg, Daniela</dc:creator>
  </rdf:Description>
</rdf:RDF>

Interner Vermerk

xmlui.Submission.submit.DescribeStep.inputForms.label.kops_note_fromSubmitter

Kontakt
URL der Originalveröffentl.

Prüfdatum der URL

Prüfungsdatum der Dissertation

Finanzierungsart

Kommentar zur Publikation

Allianzlizenz
Corresponding Authors der Uni Konstanz vorhanden
Internationale Co-Autor:innen
Universitätsbibliographie
Ja
Begutachtet
Diese Publikation teilen