Publikation:

A Versatile Hypergraph Model for Document Collections

Lade...
Vorschaubild

Dateien

Zu diesem Dokument gibt es keine Dateien.

Datum

2020

Autor:innen

Aumiller, Dennis
Soproni, Bálint
Gertz, Michael

Herausgeber:innen

Kontakt

ISSN der Zeitschrift

Electronic ISSN

ISBN

Bibliografische Daten

Verlag

Schriftenreihe

Auflagebezeichnung

URI (zitierfähiger Link)
ArXiv-ID

Internationale Patentnummer

Angaben zur Forschungsförderung

Projekt

Open Access-Veröffentlichung
Core Facility der Universität Konstanz

Gesperrt bis

Titel in einer weiteren Sprache

Publikationstyp
Beitrag zu einem Konferenzband
Publikationsstatus
Published

Erschienen in

POURABBAS, Elaheh, ed., Dimitris SACHARIDIS, ed., Kurt STOCKINGER, ed. and others. Scientific and Statistical Database Management 32th International Conference, SSDBM 2020 : Proceedings. New York, New York: ACM, 2020, 7. ISBN 978-1-4503-8814-6. Available under: doi: 10.1145/3400903.3400919

Zusammenfassung

Efficiently and effectively representing large collections of text is of central importance to information retrieval tasks such as summarization and search. Since models for these tasks frequently rely on an implicit graph structure of the documents or their contents, graph-based document representations are naturally appealing. For tasks that consider the joint occurrence of words or entities, however, existing document representations often fall short in capturing cooccurrences of higher order, higher multiplicity, or at varying proximity levels. Furthermore, while numerous applications benefit from structured knowledge sources, external data sources are rarely considered as integral parts of existing document models. To address these shortcomings, we introduce heterogeneous hypergraphs as a versatile model for representing annotated document collections. We integrate external metadata, document content, entity and term annotations, and document segmentation at different granularity levels in a joint model that bridges the gap between structured and unstructured data. We discuss selection and transformation operations on the set of hyperedges, which can be chained to support a wide range of query scenarios. To ensure compatibility with established information retrieval methods, we discuss projection operations that transform hyperedges to traditional dyadic cooccurrence graph representations. Using PostgreSQL and Neo4j, we investigate the suitability of existing database systems for implementing the hypergraph document model, and explore the impact of utilizing implicit and materialized hyperedge representations on storage space requirements and query performance.

Zusammenfassung in einer weiteren Sprache

Fachgebiet (DDC)
004 Informatik

Schlagwörter

Konferenz

SSDBM 2020: 32nd International Conference on Scientific and Statistical Database Management, 7. Juli 2020 - 9. Juli 2020, Vienna, Austria
Rezension
undefined / . - undefined, undefined

Forschungsvorhaben

Organisationseinheiten

Zeitschriftenheft

Zugehörige Datensätze in KOPS

Zitieren

ISO 690SPITZ, Andreas, Dennis AUMILLER, Bálint SOPRONI, Michael GERTZ, 2020. A Versatile Hypergraph Model for Document Collections. SSDBM 2020: 32nd International Conference on Scientific and Statistical Database Management. Vienna, Austria, 7. Juli 2020 - 9. Juli 2020. In: POURABBAS, Elaheh, ed., Dimitris SACHARIDIS, ed., Kurt STOCKINGER, ed. and others. Scientific and Statistical Database Management 32th International Conference, SSDBM 2020 : Proceedings. New York, New York: ACM, 2020, 7. ISBN 978-1-4503-8814-6. Available under: doi: 10.1145/3400903.3400919
BibTex
@inproceedings{Spitz2020Versa-53900,
  year={2020},
  doi={10.1145/3400903.3400919},
  title={A Versatile Hypergraph Model for Document Collections},
  isbn={978-1-4503-8814-6},
  publisher={ACM},
  address={New York, New York},
  booktitle={Scientific and Statistical Database Management 32th International Conference, SSDBM 2020 : Proceedings},
  editor={Pourabbas, Elaheh and Sacharidis, Dimitris and Stockinger, Kurt},
  author={Spitz, Andreas and Aumiller, Dennis and Soproni, Bálint and Gertz, Michael},
  note={Article Number: 7}
}
RDF
<rdf:RDF
    xmlns:dcterms="http://purl.org/dc/terms/"
    xmlns:dc="http://purl.org/dc/elements/1.1/"
    xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"
    xmlns:bibo="http://purl.org/ontology/bibo/"
    xmlns:dspace="http://digital-repositories.org/ontologies/dspace/0.1.0#"
    xmlns:foaf="http://xmlns.com/foaf/0.1/"
    xmlns:void="http://rdfs.org/ns/void#"
    xmlns:xsd="http://www.w3.org/2001/XMLSchema#" > 
  <rdf:Description rdf:about="https://kops.uni-konstanz.de/server/rdf/resource/123456789/53900">
    <dc:creator>Gertz, Michael</dc:creator>
    <foaf:homepage rdf:resource="http://localhost:8080/"/>
    <dspace:isPartOfCollection rdf:resource="https://kops.uni-konstanz.de/server/rdf/resource/123456789/36"/>
    <dcterms:isPartOf rdf:resource="https://kops.uni-konstanz.de/server/rdf/resource/123456789/36"/>
    <dc:contributor>Spitz, Andreas</dc:contributor>
    <dcterms:rights rdf:resource="https://rightsstatements.org/page/InC/1.0/"/>
    <void:sparqlEndpoint rdf:resource="http://localhost/fuseki/dspace/sparql"/>
    <dc:contributor>Soproni, Bálint</dc:contributor>
    <dc:rights>terms-of-use</dc:rights>
    <dc:date rdf:datatype="http://www.w3.org/2001/XMLSchema#dateTime">2021-06-07T12:57:05Z</dc:date>
    <dc:contributor>Aumiller, Dennis</dc:contributor>
    <dc:contributor>Gertz, Michael</dc:contributor>
    <dc:language>eng</dc:language>
    <dcterms:title>A Versatile Hypergraph Model for Document Collections</dcterms:title>
    <dc:creator>Spitz, Andreas</dc:creator>
    <dcterms:available rdf:datatype="http://www.w3.org/2001/XMLSchema#dateTime">2021-06-07T12:57:05Z</dcterms:available>
    <bibo:uri rdf:resource="https://kops.uni-konstanz.de/handle/123456789/53900"/>
    <dcterms:abstract xml:lang="eng">Efficiently and effectively representing large collections of text is of central importance to information retrieval tasks such as summarization and search. Since models for these tasks frequently rely on an implicit graph structure of the documents or their contents, graph-based document representations are naturally appealing. For tasks that consider the joint occurrence of words or entities, however, existing document representations often fall short in capturing cooccurrences of higher order, higher multiplicity, or at varying proximity levels. Furthermore, while numerous applications benefit from structured knowledge sources, external data sources are rarely considered as integral parts of existing document models. To address these shortcomings, we introduce heterogeneous hypergraphs as a versatile model for representing annotated document collections. We integrate external metadata, document content, entity and term annotations, and document segmentation at different granularity levels in a joint model that bridges the gap between structured and unstructured data. We discuss selection and transformation operations on the set of hyperedges, which can be chained to support a wide range of query scenarios. To ensure compatibility with established information retrieval methods, we discuss projection operations that transform hyperedges to traditional dyadic cooccurrence graph representations. Using PostgreSQL and Neo4j, we investigate the suitability of existing database systems for implementing the hypergraph document model, and explore the impact of utilizing implicit and materialized hyperedge representations on storage space requirements and query performance.</dcterms:abstract>
    <dc:creator>Aumiller, Dennis</dc:creator>
    <dc:creator>Soproni, Bálint</dc:creator>
    <dcterms:issued>2020</dcterms:issued>
  </rdf:Description>
</rdf:RDF>

Interner Vermerk

xmlui.Submission.submit.DescribeStep.inputForms.label.kops_note_fromSubmitter

Kontakt
URL der Originalveröffentl.

Prüfdatum der URL

Prüfungsdatum der Dissertation

Finanzierungsart

Kommentar zur Publikation

Allianzlizenz
Corresponding Authors der Uni Konstanz vorhanden
Internationale Co-Autor:innen
Universitätsbibliographie
Nein
Begutachtet
Diese Publikation teilen