Using Map and Reduce for Querying Distributed XML Data

Lewandowski, Lukas

Publikation:
Using Map and Reduce for Querying Distributed XML Data

Dateien

Master_Lewandowski.pdfGröße: 2.01 MBDownloads: 431

Datum

2012

Autor:innen

Lewandowski, Lukas

URI (zitierfähiger Link)

http://nbn-resolving.de/urn:nbn:de:bsz:352-188823

Link zur Lizenz

Urheberrechtlich geschützt

Open Access-Veröffentlichung

Open Access Green

Sammlungen

Informatik und Informationswissenschaft: Publikationen

Publikationstyp

Masterarbeit/Diplomarbeit

Publikationsstatus

Published

Zusammenfassung

Semi-structured information is often represented in the XML format. Although, a vast amount of appropriate databases exist that are responsible for efficiently storing semi- structured data, the vastly growing data demands larger sized databases. Even when the secondary storage is able to store the large amount of data, the execution time of complex queries increases significantly, if no suitable indexes are applicable. This situation is dramatic when short response times are an essential requirement, like in the most real-life database systems. Moreover, when storage limits are reached, the data has to be distributed to ensure availability of the complete data set. To meet this challenge this thesis presents two approaches to improve query evaluation on semi- structured and large data through parallelization. First, we analyze Hadoop and its MapReduce framework as candidate for our distributed computations and second, then we present an alternative implementation to cope with this requirements. We introduce three distribution algorithms usable for XML collections, which serve as base for our distribution to a cluster. Furthermore, we present a prototype implementation using a current open source database, named BaseX, which serves as base for our comprehensive query results.

Fachgebiet (DDC)

004 Informatik

Schlagwörter

Querying, BaseX, Hadoop, MapReduce

Zitieren

ISO 690

LEWANDOWSKI, Lukas, 2012. Using Map and Reduce for Querying Distributed XML Data [Master thesis]

BibTex

@mastersthesis{Lewandowski2012Using-18882,
  year={2012},
  title={Using Map and Reduce for Querying Distributed XML Data},
  author={Lewandowski, Lukas}
}

RDF

<rdf:RDF
    xmlns:dcterms="http://purl.org/dc/terms/"
    xmlns:dc="http://purl.org/dc/elements/1.1/"
    xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"
    xmlns:bibo="http://purl.org/ontology/bibo/"
    xmlns:dspace="http://digital-repositories.org/ontologies/dspace/0.1.0#"
    xmlns:foaf="http://xmlns.com/foaf/0.1/"
    xmlns:void="http://rdfs.org/ns/void#"
    xmlns:xsd="http://www.w3.org/2001/XMLSchema#" > 
  <rdf:Description rdf:about="https://kops.uni-konstanz.de/server/rdf/resource/123456789/18882">
    <dc:rights>terms-of-use</dc:rights>
    <dcterms:isPartOf rdf:resource="https://kops.uni-konstanz.de/server/rdf/resource/123456789/36"/>
    <dcterms:rights rdf:resource="https://rightsstatements.org/page/InC/1.0/"/>
    <dcterms:abstract xml:lang="eng">Semi-structured information is often represented in the XML format. Although, a vast amount of appropriate databases exist that are responsible for efficiently storing semi- structured data, the vastly growing data demands larger sized databases. Even when the secondary storage is able to store the large amount of data, the execution time of complex queries increases significantly, if no suitable indexes are applicable. This situation is dramatic when short response times are an essential requirement, like in the most real-life database systems. Moreover, when storage limits are reached, the data has to be distributed to ensure availability of the complete data set. To meet this challenge this thesis presents two approaches to improve query evaluation on semi- structured and large data through parallelization. First, we analyze Hadoop and its MapReduce framework as candidate for our distributed computations and second, then we present an alternative implementation to cope with this requirements. We introduce three distribution algorithms usable for XML collections, which serve as base for our distribution to a cluster. Furthermore, we present a prototype implementation using a current open source database, named BaseX, which serves as base for our comprehensive query results.</dcterms:abstract>
    <dcterms:issued>2012</dcterms:issued>
    <dspace:hasBitstream rdf:resource="https://kops.uni-konstanz.de/bitstream/123456789/18882/1/Master_Lewandowski.pdf"/>
    <dcterms:hasPart rdf:resource="https://kops.uni-konstanz.de/bitstream/123456789/18882/1/Master_Lewandowski.pdf"/>
    <dspace:isPartOfCollection rdf:resource="https://kops.uni-konstanz.de/server/rdf/resource/123456789/36"/>
    <dcterms:available rdf:datatype="http://www.w3.org/2001/XMLSchema#dateTime">2012-04-04T07:04:06Z</dcterms:available>
    <dc:language>eng</dc:language>
    <dc:date rdf:datatype="http://www.w3.org/2001/XMLSchema#dateTime">2012-04-04T07:04:06Z</dc:date>
    <dcterms:title>Using Map and Reduce for Querying Distributed XML Data</dcterms:title>
    <bibo:uri rdf:resource="http://kops.uni-konstanz.de/handle/123456789/18882"/>
    <foaf:homepage rdf:resource="http://localhost:8080/"/>
    <dc:creator>Lewandowski, Lukas</dc:creator>
    <void:sparqlEndpoint rdf:resource="http://localhost/fuseki/dspace/sparql"/>
    <dc:contributor>Lewandowski, Lukas</dc:contributor>
  </rdf:Description>
</rdf:RDF>

Publikation: Using Map and Reduce for Querying Distributed XML Data

Dateien

Datum

Autor:innen

Herausgeber:innen

Kontakt

ISSN der Zeitschrift

item.preview.dc.identifier.eissn

ISBN

Bibliografische Daten

Verlag

Schriftenreihe

Auflagebezeichnung

URI (zitierfähiger Link)

DOI (zitierfähiger Link)

item.preview.dc.identifier.arxiv

Internationale Patentnummer

Link zur Lizenz

Angaben zur Forschungsförderung

Projekt

Open Access-Veröffentlichung

Sammlungen

Core Facility der Universität Konstanz

Gesperrt bis

Titel in einer weiteren Sprache

Publikationstyp

Publikationsstatus

Erschienen in

Zusammenfassung

Zusammenfassung in einer weiteren Sprache

Fachgebiet (DDC)

Schlagwörter

Konferenz

Rezension

Forschungsvorhaben

Organisationseinheiten

Zeitschriftenheft

Zugehörige Datensätze in KOPS

Zitieren

Interner Vermerk

xmlui.Submission.submit.DescribeStep.inputForms.label.kops_note_fromSubmitter

Kontakt

URL der Originalveröffentl.

Prüfdatum der URL

Prüfungsdatum der Dissertation

Finanzierungsart

Kommentar zur Publikation

Allianzlizenz

Corresponding Authors der Uni Konstanz vorhanden

Internationale Co-Autor:innen

Universitätsbibliographie

Begutachtet

Diese Publikation teilen

Publikation:
Using Map and Reduce for Querying Distributed XML Data