TreeTank, a native XML storage
2009, Graf, Sebastian
TreeTank is an easy-to-use framework which allows users to store, modify and retrieve tree-structured data. Modifications are encapsulated in different revisions which supports a navigation not only in the data-structure but also in the time-axis. Despite being an active research project, the software is covered with automatic test-cases and is used as a base for multiple student-projects and their thesis. TreeTank comes as a light-weight java program which allows easy handling in any java environment.
This report gives a short overview over the motivation, the technical foundation and projects based on with cited papers and an outlook in further extensions.
Distributing XML with focus on parallel evaluation
2008, Graf, Sebastian, Kramis, Marc, Waldvogel, Marcel
In contrast to relational databases the distribution of document-centric XML is not well researched. While there are some suggestions on how to split and distribute large XML documents, these approaches do not consider the parallel query evaluation. In this paper, we present and compare five different algorithms to search after suitable split nodes in a large XML document. We then describe how to distribute extractable sub-structures over a fixed number of peers and how to query these peers in parallel to retrieve the final result. In addition, we analyse the impact of our splitting algorithms with respect to scalability for two different XPath expression classes on three well-known XML data sets. We conclude this paper with an outlook on future work, including result ordering during parallel query execution and dynamic re-distribution of XML fragments to new peers due to updates.
Evolutionäres Lernen von Regelsystemen
2005, Graf, Sebastian
Im Data Mining spielen Klassifikation eine große Rolle. Bei Klassifikation werden Regeln gebildet, mit denen unbekannte Daten anhand eines bezeichnenden Attributs eingeordnet werden. In dieser Bachelorarbeit wird ein Ansatz zum Lernen von Regelsystemen basierend auf evolutionären Algorithmen vorgestellt. Dabei wird insbesondere auf die genetischen Algorithmen eingegangen. Im Folgenden werden die beiden bekanntesten Vertreter genetischer Algorithmen zum Lernen von Regeln, der Michigan-Ansatz und der Pittsburgh-Ansatz, näher vorgestellt und miteinander verglichen. Dabei werden Parameter und Erweiterungen beim Pittsburgh-Ansatz vorgestellt und diskutiert. Die Grundlage dieser Arbeit bildet eine Implementierung eines Pittsburgh-Ansatzes im HADES Framework der Arbeitsgruppe Bioinformatik und Information Mining des Fachbereichs Informatik und Informationswissenschaft der Universität Konstanz.
jSCSI 2.0 : Multithreaded Low-Level Distributed Block Access
2009, Graf, Sebastian, Brend amour, Patrice, Waldvogel, Marcel
In 2007 we introduced jSCSI 1.0 to the public. The use case was to access block-patterns directly from Java without any third party JNI invoked software. In the last 2 years we explored the capabilities to assimilate multithreading in jSCSI. The goal was to leverage the outstanding features of the new Java multithreading extension introduced with Java5/Java6 and incorporate them into our proven block-level accessing framework. Today, we present the next incarnation of jSCSI 1.0, jSCSI 2.0 which yields significant performance improvements by utilizing Java s advanced multithreading capabilities. We show that our Java based implementation of a low-level architecture is not only a proposing alternative in terms of performance but also in the ease-of-use compared to common JNI-invoked system calls. Therefore, we argue that jSCSI 2.0 is not only a platform independent implementation of the iSCSI protocol, but
also a fast and robust proof for implementing low-level applications in Java.
jSCSI - A Java iSCSI Initiator
2007, Kramis, Marc, Wildi, Volker, Lemke, Bastian, Graf, Sebastian, Janetzko, Halldor, Waldvogel, Marcel
jSCSI represents an initiator implementation of the iSCSI standard. This short paper describes the current work-in-progress of the jSCSI 1.0 release and gives first benchmarks as well as an outlook for upcoming releases.
Verteilungsansätze von großen Datenmengen
2008, Graf, Sebastian
The era of single-core processors comes to an end. Only a few modern computer systems own less than two cores nowadays. To use these latterly parallel available ressources in an optimal way, the usage of data must be adapted. This adaption covers the distribution of the data. This thesis at hand is addressed to this aspect with respect to the evaluation of text-based data formats. More precisely, distributed queries are presented based on Comma Separated Values (CSV), on Extended Markup Language (XML)-based data regarding the string representation and on Extended Markup Language (XML)-based data with respect to the structure. Multiple variants for partitioning the data are presented for each approach. Especially the fragmentation of XML-based data in consideration of the structure shows the dependency between the structure itself and different approaches for partitioning the data. Therefore a possibility to generate a consistent fragmentation which is independent from the structure is presented. Distributed queries on well-known, fragmented XML-databases like wikipedia, treebank, xmark and dblp show the beneﬁts of these approaches. Distributed XPath -queries need, depending on the fragmentation and the available ressources less than half of the time if a not-distributed query. Based on these results, further optimizations can be done. Especially the query could be improved by the usage of Pipelining on XPath.
PERFIDIX : a Generic Java Benchmarking Tool
2007, Kramis, Marc, Onea, Alexander, Graf, Sebastian
PERFIDIX shows an easy way to benchmark your code. Unlike heavy-load profilers, PERFIDIX comes in an easy-to-use way based on the proven usage of JUnit Testcases. This short paper describes the current work-in progress of the PERFIDIX 1.0 release.