Stream Reasoning : Where We Got So Far
2010, Barbieri, Davide, Braga, Daniele, Ceri, Stefano, Della Valle, Emanuele, Grossniklaus, Michael
Data Streams - unbounded sequences of time-varying data elements - are pervasive. They occur in a variety of modern applications including the Web where blogs, feeds, and microblogs are increasingly adopted to distribute and present information in real-time streams. We foresee the need for languages, tools and methodologies for representing, managing and reasoning on data streams for the Semantic Web. We collectively name those research chapters Stream Reasoning. In this extended abstract, we motivate the need for investigating Steam Reasoning; we characterize the notion of Stream Reasoning; we report the results obtained by Politecnico di Milano in studying Stream Reasoning from 2008 to 2010; and we close the paper with a short review of the related works and some outlooks.
Incremental Reasoning on Streams and Rich Background Knowledge
2010, Barbieri, Davide Francesco, Braga, Daniele, Ceri, Stefano, Della Valle, Emanuele, Grossniklaus, Michael
This article presents a technique for Stream Reasoning, consisting in incremental maintenance of materializations of ontological entailments in the presence of streaming information. Previous work, delivered in the context of deductive databases, describes the use of logic programming for the incremental maintenance of such entailments. Our contribution is a new technique that exploits the nature of streaming data in order to efficiently maintain materialized views of RDF triples, which can be used by a reasoner.
By adding expiration time information to each RDF triple, we show that it is possible to compute a new complete and correct materialization whenever a new window of streaming data arrives, by dropping explicit statements and entailments that are no longer valid, and then computing when the RDF triples inserted within the window will expire. We provide experimental evidence that our approach significantly reduces the time required to compute a new materialization at each window change, and opens up for several further optimizations.
Continuous Queries and Real-time Analysis of Social Semantic Data with C-SPARQL
2009, Barbieri, Davide Francesco, Braga, Daniele, Ceri, Stefano, Della Valle, Emanuele, Grossniklaus, Michael
Social semantic data are becoming a reality, but apparently their streaming nature has been ignored so far. Streams, being unboun- ded sequences of time-varying data elements, should not be treated as persistent data to be stored “forever” and queried on demand, but rather as transient data to be consumed on the fly by queries which are regis- tered once and for all and keep analyzing such streams, producing an- swers triggered by the streaming data and not by explicit invocation. In this paper, we propose an approach to continuous queries and real- time analysis of social semantic data with C-SPARQL, an extension of SPARQL for querying RDF streams.
An execution environment for C-SPARQL queries
2010, Barbieri, Davide Francesco, Braga, Daniele, Ceri, Stefano, Grossniklaus, Michael
Continuous SPARQL (C-SPARQL) is proposed as new language for continuous queries over streams of RDF data. It covers a gap in the Semantic Web abstractions which is needed for many emerging applications, including our focus on Urban Computing. In this domain, sensor-based information on roads must be processed to deduce localized traffic conditions and then produce traffic management strategies. Executing C-SPARQL queries requires the effective integration of SPARQL and streaming technologies, which capitalize over a decade of research and development; such integration poses several nontrivial challenges.
In this paper we (a) show the syntax and semantics of the C-SPARQL language together with some examples; (b) introduce a query graph model which is an intermediate representation of queries devoted to optimization; (c) discuss the features of an execution environment that leverages existing technologies; (d) introduce optimizations in terms of rewriting rules applied to the query graph model, so as to efficiently exploit the execution environment; and (e) show evidence of the effectiveness of our optimizations on a prototype of execution environment.
Panta Rhei : Optimized and Ranked Data Processing over Heterogeneous Sources
2010, Braga, Daniele, Corcoglioniti, Francesco, Grossniklaus, Michael, Vadacca, Salvatore
In the era of digital information, the value of data resides not only in its volume and quality, but also in the additional information that can be inferred from the combination (aggregation, comparison and join) of such data. There is a concrete need for data processing solutions that combine distributed and heterogeneous data sources, such as Web services, relational databases, and even search engines, that can all be modeled as services. In this demonstration, we show how our Panta Rhei model addresses the challenge of processing data over heterogeneous sources to provide feasible and ranked combinations of these services.
Data-driven Optimization of Search Service Composition for Answering Multi-domain Queries
2009, Barbieri, Davide Francesco, Bozzon, Alessandro, Braga, Daniele, Brambilla, Marco, Campi, Alessandro, Ceri, Stefano, Della Valle, Emanuele, Fraternali, Piero, Grossniklaus, Michael, Martinenghi, Davide
Answering multi-domain queries requires the combination of knowledge from various domains. Such queries are inadequately answered by general-purpose search engines, because domain- specific systems typically exhibit sophisticated knowledge about their own fields of expertise. Moreover, multi-domain queries typically require combining in the result domain knowledge possibly coming from multiple web resources, therefore conventional crawling and indexing techniques, based on individual pages, are not adequate. In this paper we present a conceptual framework for addressing the composition of search services for solving multi-domain queries. The approach consists in building an infrastructure for search service composition that leaves within each search system the responsibility of maintaining and improving its domain knowledge, and whose main challenge is to provide the “glue” between them; such glue is expressed in the format of joins upon search service results, and for this feature we regard our approach as “data-driven”. We present an overall architecture, and the work that has been done so far in the development of some of the main modules.
Search Computing Systems (Extended Abstract)
2010, Ceri, Stefano, Abid, Adnan, Helou, Mamoun Abu, Bozzon, Alessandro, Braga, Daniele, Brambilla, Marco, Campi, Alessandro, Corcoglioniti, Francesco, Della Valle, Emanuele, Eynard, Davide, Fraternali, Piero, Grossniklaus, Michael, Martinenghi, Davide, Ronchi, Stefania, Tagliasacchi, Marco, Vadacca, Salvatore
Search Computing defines a new class of applications, which enable end users to perform exploratory search processes over multi-domain data sources available on the Web. These applications exploit suitable software frameworks and models that make it possible for expert users to configure the data sources to be searched and the interfaces for query submission and result visualization. We describe some usage scenarios and the reference architecture for Search Computing systems.
Search Computing Challenges and Directions
2010, Ceri, Stefano, Braga, Daniele, Corcoglioniti, Francesco, Grossniklaus, Michael, Vadacca, Salvatore
Search Computing (SeCo) is a project funded by the European Research Council (ERC). It focuses on building the answers to complex search queries like “Where can I attend an interesting conference in my field close to a sunny beach?” by interacting with a constellation of cooperating search services, using ranking and joining of results as the dominant factors for service composition. SeCo started on November 2008 and will last 5 years. This paper will give a general introduction to the Search Computing approach and then focus on its query optimization and execution engine, the aspect of the project which is most tightly related to “objects and databases” technologies.
C-SPARQL : SPARQL for continuous querying
2009, Barbieri, Davide Francesco, Braga, Daniele, Ceri, Stefano, Valle, Emanuele Della, Grossniklaus, Michael
C-SPARQL is an extension of SPARQL to support continuous queries, registered and continuously executed over RDF data streams, considering windows of such streams. Supporting streams in RDF format guarantees interoperability and opens up important applications, in which reasoners can deal with knowledge that evolves over time. We present C-SPARQL by means of examples in Urban Computing.