Using Map and Reduce for Querying Distributed XML Data

Lewandowski, Lukas

Using Map and Reduce for Querying Distributed XML Data

dc.contributor.author	Lewandowski, Lukas
dc.date.accessioned	2012-04-04T07:04:06Z	deu
dc.date.available	2012-04-04T07:04:06Z	deu
dc.date.issued	2012	deu
dc.description.abstract	Semi-structured information is often represented in the XML format. Although, a vast amount of appropriate databases exist that are responsible for efficiently storing semi- structured data, the vastly growing data demands larger sized databases. Even when the secondary storage is able to store the large amount of data, the execution time of complex queries increases significantly, if no suitable indexes are applicable. This situation is dramatic when short response times are an essential requirement, like in the most real-life database systems. Moreover, when storage limits are reached, the data has to be distributed to ensure availability of the complete data set. To meet this challenge this thesis presents two approaches to improve query evaluation on semi- structured and large data through parallelization. First, we analyze Hadoop and its MapReduce framework as candidate for our distributed computations and second, then we present an alternative implementation to cope with this requirements. We introduce three distribution algorithms usable for XML collections, which serve as base for our distribution to a cluster. Furthermore, we present a prototype implementation using a current open source database, named BaseX, which serves as base for our comprehensive query results.	eng
dc.description.version	published
dc.identifier.ppn	363185097	deu
dc.identifier.uri	http://kops.uni-konstanz.de/handle/123456789/18882
dc.language.iso	eng	deu
dc.legacy.dateIssued	2012-04-04	deu
dc.rights	terms-of-use	deu
dc.rights.uri	https://rightsstatements.org/page/InC/1.0/	deu
dc.subject	Querying	deu
dc.subject	BaseX	deu
dc.subject	Hadoop	deu
dc.subject	MapReduce	deu
dc.subject.ddc	004	deu
dc.subject.gnd	XML	deu
dc.subject.gnd	Distribution <Informatik>	deu
dc.subject.gnd	Abfrage	deu
dc.subject.gnd	XQuery	deu
dc.title	Using Map and Reduce for Querying Distributed XML Data	eng
dc.type	MSC_THESIS	deu
dspace.entity.type	Publication
kops.citation.bibtex	@mastersthesis{Lewandowski2012Using-18882, year={2012}, title={Using Map and Reduce for Querying Distributed XML Data}, author={Lewandowski, Lukas} }
kops.citation.iso690	LEWANDOWSKI, Lukas, 2012. Using Map and Reduce for Querying Distributed XML Data [Master thesis]	deu
kops.citation.iso690	LEWANDOWSKI, Lukas, 2012. Using Map and Reduce for Querying Distributed XML Data [Master thesis]	eng
kops.citation.rdf	<rdf:RDF xmlns:dcterms="http://purl.org/dc/terms/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:bibo="http://purl.org/ontology/bibo/" xmlns:dspace="http://digital-repositories.org/ontologies/dspace/0.1.0#" xmlns:foaf="http://xmlns.com/foaf/0.1/" xmlns:void="http://rdfs.org/ns/void#" xmlns:xsd="http://www.w3.org/2001/XMLSchema#" > <rdf:Description rdf:about="https://kops.uni-konstanz.de/server/rdf/resource/123456789/18882"> <dc:rights>terms-of-use</dc:rights> <dcterms:isPartOf rdf:resource="https://kops.uni-konstanz.de/server/rdf/resource/123456789/36"/> <dcterms:rights rdf:resource="https://rightsstatements.org/page/InC/1.0/"/> <dcterms:abstract xml:lang="eng">Semi-structured information is often represented in the XML format. Although, a vast amount of appropriate databases exist that are responsible for efficiently storing semi- structured data, the vastly growing data demands larger sized databases. Even when the secondary storage is able to store the large amount of data, the execution time of complex queries increases significantly, if no suitable indexes are applicable. This situation is dramatic when short response times are an essential requirement, like in the most real-life database systems. Moreover, when storage limits are reached, the data has to be distributed to ensure availability of the complete data set. To meet this challenge this thesis presents two approaches to improve query evaluation on semi- structured and large data through parallelization. First, we analyze Hadoop and its MapReduce framework as candidate for our distributed computations and second, then we present an alternative implementation to cope with this requirements. We introduce three distribution algorithms usable for XML collections, which serve as base for our distribution to a cluster. Furthermore, we present a prototype implementation using a current open source database, named BaseX, which serves as base for our comprehensive query results.</dcterms:abstract> <dcterms:issued>2012</dcterms:issued> <dspace:hasBitstream rdf:resource="https://kops.uni-konstanz.de/bitstream/123456789/18882/1/Master_Lewandowski.pdf"/> <dcterms:hasPart rdf:resource="https://kops.uni-konstanz.de/bitstream/123456789/18882/1/Master_Lewandowski.pdf"/> <dspace:isPartOfCollection rdf:resource="https://kops.uni-konstanz.de/server/rdf/resource/123456789/36"/> <dcterms:available rdf:datatype="http://www.w3.org/2001/XMLSchema#dateTime">2012-04-04T07:04:06Z</dcterms:available> <dc:language>eng</dc:language> <dc:date rdf:datatype="http://www.w3.org/2001/XMLSchema#dateTime">2012-04-04T07:04:06Z</dc:date> <dcterms:title>Using Map and Reduce for Querying Distributed XML Data</dcterms:title> <bibo:uri rdf:resource="http://kops.uni-konstanz.de/handle/123456789/18882"/> <foaf:homepage rdf:resource="http://localhost:8080/"/> <dc:creator>Lewandowski, Lukas</dc:creator> <void:sparqlEndpoint rdf:resource="http://localhost/fuseki/dspace/sparql"/> <dc:contributor>Lewandowski, Lukas</dc:contributor> </rdf:Description> </rdf:RDF>
kops.description.openAccess	openaccessgreen
kops.identifier.nbn	urn:nbn:de:bsz:352-188823	deu
kops.submitter.email	lukas.lewandowski@uni-konstanz.de	deu
relation.isAuthorOfPublication	ad058314-8a22-43a4-ab70-fe83a4e18a84
relation.isAuthorOfPublication.latestForDiscovery	ad058314-8a22-43a4-ab70-fe83a4e18a84

Dateien

Originalbündel

Gerade angezeigt 1 - 1 von 1

Name:: Master_Lewandowski.pdf
Größe:: 2.01 MB
Format:: Adobe Portable Document Format

Master_Lewandowski.pdfGröße: 2.01 MBDownloads: 431

Lizenzbündel

Gerade angezeigt 1 - 1 von 1

Name:: license.txt
Größe:: 1.92 KB
Format:: Plain Text
Beschreibung:

license.txtGröße: 1.92 KBDownloads: 0

Sammlungen

Informatik und Informationswissenschaft: Publikationen