Fork me on GitHub
Sparkwave is a solution for continuous pattern matching over RDF data streams. It is based on the Rete algorithm, which allows efficient and truly continuous processing of streamed data. Sparkwave leverages RDF schema information associated to data streams to compute entailments, so that implicit knowledge is taken into account for pattern matching. In addition, it supports time-based sliding window to cope with the streaming nature of processed data.

In order to evaluate the performance of Sparkwave, we defined some tests reusing part of the dataset from the Berlin SPARQL Benchmark (BSBM). The BSBM defines a suite of benchmarks to compare the performance of RDF stores and, in general, of storage systems that expose SPARQL endpoints. The benchmark is built around an e-commerce use case in which a set of products is offered by different vendors, and consumers have posted reviews about the products.

The BSBM is not intended for measuring stream processing systems and by itself does not provide data streams, so we had to adapt it to our needs. In particular, we reused the data describing product types and offers. The BSBM generates a single-rooted hierarchy of product types, whose depth and number of classes are determined by a scaling factor; each product is then assigned to a product type at the leaf level of the hierarchy. The hierarchy of product types constitutes for us the schema. The BSBM also generates a number of offers related to products, which for us constitute the data stream.

Given these schema and stream from the BSBM, we defined three graph patterns:

	-- Pattern 1: ------------------------------------------------------------------------
	
	REGISTER QUERY BSBSOfferDetection AS
	PREFIX bsbm-voc:<http://www4.wiwiss.fu-berlin.de/bizer/bsbm/v01/vocabulary/>
	PREFIX bsbm-inst:<http://www4.wiwiss.fu-berlin.de/bizer/bsbm/v01/instances/>
	PREFIX rdf:<http://www.w3.org/1999/02/22-rdf-syntax-ns#>
	PREFIX dc:<http://purl.org/dc/elements/1.1/>
	SELECT ?offer ?product ?vendor ?price ?from ?to ?delivery ?webpage
		   ?publisher ?date
	FROM STREAM <http://bsbm.org/stream>
	FROM <http://members.sti2.at/~srdjank/ProductTypeHierarchy.rdf>
	WHERE {
		?offer rdf:type bsbm-voc:Offer.
		?offer bsbm-voc:product ?product.
		?offer bsbm-voc:vendor ?vendor.
		?offer bsbm-voc:price ?price.
		?offer bsbm-voc:validFrom ?from.
		?offer bsbm-voc:validTo ?to.
		?offer bsbm-voc:deliveryDays ?delivery.
		?offer bsbm-voc:offerWebpage ?webpage.
		?offer dc:publisher ?publisher.
		?offer dc:date ?date.
		?product rdf:type bsbm-inst:TargetProductType
	}
	
	-- Pattern 2: additionally to pattern 1, in WHERE clause: ------------------------------
		?vendor bsbm-voc:country http://downlode.org/rdf/iso-3166/countries#GB
	
	-- Pattern 3: additionally to pattern 1, in WHERE clause: ------------------------------
		?vendor bsbm-voc:country http://downlode.org/rdf/iso-3166/countries#GB
		?product bsbm-voc:producer ?producer
		?producer bsbm-voc:country http://downlode.org/rdf/iso-3166/countries#DE
			  

Memory consumption over time window size

Memory consumption

Throughput over time window size

Throughput over time window

Throughput over background knowledge size

Throughput over background knowledge