Using Map and Reduce for Querying Distributed XML Data

dc.contributor.authorLewandowski, Lukas
dc.date.accessioned2012-04-04T07:04:06Zdeu
dc.date.available2012-04-04T07:04:06Zdeu
dc.date.issued2012deu
dc.description.abstractSemi-structured information is often represented in the XML format. Although, a vast amount of appropriate databases exist that are responsible for efficiently storing semi- structured data, the vastly growing data demands larger sized databases. Even when the secondary storage is able to store the large amount of data, the execution time of complex queries increases significantly, if no suitable indexes are applicable. This situation is dramatic when short response times are an essential requirement, like in the most real-life database systems. Moreover, when storage limits are reached, the data has to be distributed to ensure availability of the complete data set. To meet this challenge this thesis presents two approaches to improve query evaluation on semi- structured and large data through parallelization. First, we analyze Hadoop and its MapReduce framework as candidate for our distributed computations and second, then we present an alternative implementation to cope with this requirements. We introduce three distribution algorithms usable for XML collections, which serve as base for our distribution to a cluster. Furthermore, we present a prototype implementation using a current open source database, named BaseX, which serves as base for our comprehensive query results.eng
dc.description.versionpublished
dc.identifier.ppn363185097deu
dc.identifier.urihttp://kops.uni-konstanz.de/handle/123456789/18882
dc.language.isoengdeu
dc.legacy.dateIssued2012-04-04deu
dc.rightsterms-of-usedeu
dc.rights.urihttps://rightsstatements.org/page/InC/1.0/deu
dc.subjectQueryingdeu
dc.subjectBaseXdeu
dc.subjectHadoopdeu
dc.subjectMapReducedeu
dc.subject.ddc004deu
dc.subject.gndXMLdeu
dc.subject.gndDistribution <Informatik>deu
dc.subject.gndAbfragedeu
dc.subject.gndXQuerydeu
dc.titleUsing Map and Reduce for Querying Distributed XML Dataeng
dc.typeMSC_THESISdeu
dspace.entity.typePublication
kops.citation.bibtex
@mastersthesis{Lewandowski2012Using-18882,
  year={2012},
  title={Using Map and Reduce for Querying Distributed XML Data},
  author={Lewandowski, Lukas}
}
kops.citation.iso690LEWANDOWSKI, Lukas, 2012. Using Map and Reduce for Querying Distributed XML Data [Master thesis]deu
kops.citation.iso690LEWANDOWSKI, Lukas, 2012. Using Map and Reduce for Querying Distributed XML Data [Master thesis]eng
kops.citation.rdf
<rdf:RDF
    xmlns:dcterms="http://purl.org/dc/terms/"
    xmlns:dc="http://purl.org/dc/elements/1.1/"
    xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"
    xmlns:bibo="http://purl.org/ontology/bibo/"
    xmlns:dspace="http://digital-repositories.org/ontologies/dspace/0.1.0#"
    xmlns:foaf="http://xmlns.com/foaf/0.1/"
    xmlns:void="http://rdfs.org/ns/void#"
    xmlns:xsd="http://www.w3.org/2001/XMLSchema#" > 
  <rdf:Description rdf:about="https://kops.uni-konstanz.de/server/rdf/resource/123456789/18882">
    <dc:rights>terms-of-use</dc:rights>
    <dcterms:isPartOf rdf:resource="https://kops.uni-konstanz.de/server/rdf/resource/123456789/36"/>
    <dcterms:rights rdf:resource="https://rightsstatements.org/page/InC/1.0/"/>
    <dcterms:abstract xml:lang="eng">Semi-structured information is often represented in the XML format. Although, a vast amount of appropriate databases exist that are responsible for efficiently storing semi- structured data, the vastly growing data demands larger sized databases. Even when the secondary storage is able to store the large amount of data, the execution time of complex queries increases significantly, if no suitable indexes are applicable. This situation is dramatic when short response times are an essential requirement, like in the most real-life database systems. Moreover, when storage limits are reached, the data has to be distributed to ensure availability of the complete data set. To meet this challenge this thesis presents two approaches to improve query evaluation on semi- structured and large data through parallelization. First, we analyze Hadoop and its MapReduce framework as candidate for our distributed computations and second, then we present an alternative implementation to cope with this requirements. We introduce three distribution algorithms usable for XML collections, which serve as base for our distribution to a cluster. Furthermore, we present a prototype implementation using a current open source database, named BaseX, which serves as base for our comprehensive query results.</dcterms:abstract>
    <dcterms:issued>2012</dcterms:issued>
    <dspace:hasBitstream rdf:resource="https://kops.uni-konstanz.de/bitstream/123456789/18882/1/Master_Lewandowski.pdf"/>
    <dcterms:hasPart rdf:resource="https://kops.uni-konstanz.de/bitstream/123456789/18882/1/Master_Lewandowski.pdf"/>
    <dspace:isPartOfCollection rdf:resource="https://kops.uni-konstanz.de/server/rdf/resource/123456789/36"/>
    <dcterms:available rdf:datatype="http://www.w3.org/2001/XMLSchema#dateTime">2012-04-04T07:04:06Z</dcterms:available>
    <dc:language>eng</dc:language>
    <dc:date rdf:datatype="http://www.w3.org/2001/XMLSchema#dateTime">2012-04-04T07:04:06Z</dc:date>
    <dcterms:title>Using Map and Reduce for Querying Distributed XML Data</dcterms:title>
    <bibo:uri rdf:resource="http://kops.uni-konstanz.de/handle/123456789/18882"/>
    <foaf:homepage rdf:resource="http://localhost:8080/"/>
    <dc:creator>Lewandowski, Lukas</dc:creator>
    <void:sparqlEndpoint rdf:resource="http://localhost/fuseki/dspace/sparql"/>
    <dc:contributor>Lewandowski, Lukas</dc:contributor>
  </rdf:Description>
</rdf:RDF>
kops.description.openAccessopenaccessgreen
kops.identifier.nbnurn:nbn:de:bsz:352-188823deu
kops.submitter.emaillukas.lewandowski@uni-konstanz.dedeu
relation.isAuthorOfPublicationad058314-8a22-43a4-ab70-fe83a4e18a84
relation.isAuthorOfPublication.latestForDiscoveryad058314-8a22-43a4-ab70-fe83a4e18a84

Dateien

Originalbündel

Gerade angezeigt 1 - 1 von 1
Vorschaubild nicht verfügbar
Name:
Master_Lewandowski.pdf
Größe:
2.01 MB
Format:
Adobe Portable Document Format
Master_Lewandowski.pdf
Master_Lewandowski.pdfGröße: 2.01 MBDownloads: 431

Lizenzbündel

Gerade angezeigt 1 - 1 von 1
Vorschaubild nicht verfügbar
Name:
license.txt
Größe:
1.92 KB
Format:
Plain Text
Beschreibung:
license.txt
license.txtGröße: 1.92 KBDownloads: 0