What is materialisation?

Materialisation is one way for a reasoning engine to perform its reasoning.

When given some data and some rules, the reasoning engine will create new data and put it in the datastore, adding information to the dataset itself. This approach is different from backward chaining where no data is created but queries are instead rewritten internally, providing information at a level removed from the data layer. There are also hybrid approaches.

Let’s say we have this data:

:article_1 :hasTag :italian_cooking .
:article_2 :hasTag :italian_restaurants .
:italian_cooking :hasSuperCategory :cooking , :italian_food .
:italian_restaurants :hasSuperCategory :restaurants, :italian_food .
:alice :liked :article_1 .

We want to find other articles to recommend to Alice, given that she liked the first.
It’s likely that she will like the second article too as it’s about Italian food, albeit indirectly. So what we can do is add a rule like this:


?article1 :hasTag ?tag1 ,  ?tag1 :hasSuperCategory ?superTag .
?article2 :hasTag ?tag2 , ?tag2 :hasSuperCategory ?superTag .

-->  

?article1 :relatedTo ?article2 .

And

?person :liked ?article ,
?article :relatedTo ?otherArticle .

-->

?person :mightLike ?otherArticle .

Now, instead of running a complex query, we’ll be able to just run a one line query:


SELECT ?person ?article WHERE { ?person :mightLike ?article }

Advantages of materialisation:

  • Query performance is far better and more predictable.
  • Queries are easier to write.

Disadvantages of materialisation:

  • There may be an upfront computational cost to pay when importing rules or data.
  • Complex algorithms need to be developed to manage data/rule updates. RDFox uses these, but many materialised stores don’t.

Some may ask: what’s to stop me from using a SPARQL update (I.e. a write query) to ‘materialise the data’ instead of a rule?

There are two main reasons:

1-When using a write query, there is no difference between explicit data and inferred data. So what do you do if you realise that your logic was slightly off? What do you do if the explicit data later changes, e.g. Alice unlikes the article because she goes on a diet and no longer wants to see anything about food?
Using materialisation the reasoning engine keeps track of what data was created by what rule and based on which data.
If the logic or the data change, then the consequences can be automatically updated.

2-Materialisation is recursive, I.e. sometimes we want to derive data based on inferred triples (I.e. triples that were created by the reasoning engine), such as in the example above. We could of course keep track of ‘an order of evaluation’, but this adds an extra layer.



More importantly, sometimes the reasoning engine will materialise more data that is then fed into the same rule that materialised it.

Thus, a write query would have to be executed an arbitrary number of times until no new triples can be derived. But the reasoner can do this automatically, removing all the complexity of managing the reasoning from the user.

Down arrow icon.