Chapter 10 : Join Methods and Query Optimization
2010, Braga, Daniele, Ceri, Stefano, Grossniklaus, Michael
Joins between data sources are an essential ingredient of multi-domain queries, as they exploit connection patterns defined between service marts or between service interfaces. This chapter moves from the definition of a query language over service interfaces, sketching how queries can be directly expressed over service marts and how these can be translated over service interfaces. The fundamental operation discussed in this chapter is the binary join between two sources, which is influenced by the type (search vs. exact) of services and by the management (parallel vs. sequential) of service calls. Then, this chapter presents an optimization framework for queries over several service interfaces, which considers several cost metrics for mapping queries into query plans, consisting of specific operations over services, and includes a branch and bound approach to the exploration of the combinatorial search space of all possible query plans.
Efficient Computation of Search Computing Queries
2011, Braga, Daniele, Grossniklaus, Michael, Corcoglioniti, Francesco, Vadacca, Salvatore
This chapter gives a high-level overview of how query processing is carried out in SeCo. At the highest level of abstraction, queries are expressed in a conjunctive declarative query language over service interfaces, named SeCoQL, chosen to be a compact and readable formulation to serve both experts users and system developers. Queries are then expressed at a logical level in the form of acyclic invocation workflows, after a compile-time analysis that decides a cost-driven scheduling of service invocations. At a lower, physical level queries are then translated into executable specifications that distinguish between the data flow and the control flow, support parallelism, account for stateless and stateful computation tasks, and support backward and forward control. The query engine is implemented as an interpreter of these physical plans. A workbench and testing environment is also available in the form of a tool, to monitor the processing of complex queries by inspecting all phases of their analysis and execution, at all levels of abstraction.
Chapter 12 : Panta Rhei ; Flexible Execution Engine for Search Computing Queries
2010, Braga, Daniele, Ceri, Stefano, Corcoglioniti, Francesco, Grossniklaus, Michael
The efficient execution of data-intensive computations over services is a challenging task: data are retrieved from remote sources and therefore are not available in the query engine until after the execution of these calls, but the system must be inherently efficient thereafter, by guaranteeing that data is immediately cached and processed efficiently, according to the best query plan. In this chapter, we present a flexible execution model for search computing queries, named Panta Rhei. The proposed execution engine paradigm adopts the producer/consumer model and supports both data-driven and event-driven synchronization, and their interplay. Query plans are modeled as directed graphs, whose nodes are processing units and whose edges are either control or data flows. While control flows synchronize service calls and unit execution, data flows transfer data between units that process data flows to produce query results. We present the specification of Panta Rhei by formally defining the units for data production, consumption, manipulation, and caching, as well as the control and data flows. Finally, we discuss how a query plan is expressed in terms of a query execution plan.