Verteilungsansätze von großen Datenmengen
2008, Graf, Sebastian
The era of single-core processors comes to an end. Only a few modern computer systems own less than two cores nowadays. To use these latterly parallel available ressources in an optimal way, the usage of data must be adapted. This adaption covers the distribution of the data. This thesis at hand is addressed to this aspect with respect to the evaluation of text-based data formats. More precisely, distributed queries are presented based on Comma Separated Values (CSV), on Extended Markup Language (XML)-based data regarding the string representation and on Extended Markup Language (XML)-based data with respect to the structure. Multiple variants for partitioning the data are presented for each approach. Especially the fragmentation of XML-based data in consideration of the structure shows the dependency between the structure itself and different approaches for partitioning the data. Therefore a possibility to generate a consistent fragmentation which is independent from the structure is presented. Distributed queries on well-known, fragmented XML-databases like wikipedia, treebank, xmark and dblp show the beneﬁts of these approaches. Distributed XPath -queries need, depending on the fragmentation and the available ressources less than half of the time if a not-distributed query. Based on these results, further optimizations can be done. Especially the query could be improved by the usage of Pipelining on XPath.