Back to The Blog

RDFox Datalog Constraints

A new feature of RDFox v3.1

A new feature of RDFox v3.1

Photo by Jessica Lee on Unsplash

RDFox v3.1.0 was released on the 7th July 2020. Along with the addition of SWRL support and a slew of small improvements and fixes, the new version introduces an exciting new feature to improve support for applications that require reasoning under the closed-world assumption: Datalog Constraints.

Update: the examples in this article were modified to be compatible with RDFox v4.1.0 (release notes here).

Datalog Constraints leverage RDFox’s unique incremental reasoning capabilities to bring the expressiveness of RDFox’s rule language to the problem of constraining data store content. In so doing, they provide application developers a means to ensure that their data stores remain valid and focused on their application domain, without having to write any external code.

High-level Description

RDFox exploits incremental reasoning algorithms to ensure that materialisations are up-to-date before a transaction is committed; that is, the implicit facts in each data store are exactly those which logically follow from applying the store’s rules to its explicit facts. Using incremental and materialised reasoning in this way ensures that implicit facts can be queried with the same jaw-dropping speed as explicit facts.

Building on this, RDFox’s new constraint validation feature is implemented as a commit-time check, performed after the incremental reasoning step, that the transaction will not introduce any instances of a special constraint violation class into the data store’s default graph. Any transaction which fails this check is rejected with an explanatory message that can be determined by the author of the constraint.

The good news for anyone who already knows RDFox Datalog is that they already know how to write Datalog Constraints. The good news for everyone else is that RDFox Datalog is easy to learn as we’ll now see.

RDFox Datalog

Before we examine Datalog Constraints more closely it is useful to quickly describe RDFox Datalog. A general Datalog rule has the structure:

<HEAD> :- <BODY> .

where <HEAD> and <BODY> are lists of triple patterns separated by commas. A triple pattern is a triple in which variables may occur in any (or all) of the subject, property or object positions. For example, the triple pattern
[?person, a, foaf:Person] matches all triples that have rdf:type in the property position and foaf:Person in the object position. The ?person variable will be bound to whichever name appears in the subject position of the matching triple.

When all of the triple patterns in a rule’s body match a subset of the data in a data store, RDFox adds in the facts specified by substituting the values bound by the rule body into the triple patterns in the rule head. For example, the following simple Datalog rule makes the relationship :marriedTo symmetric by ensuring that, wherever we have a statement to say that person A is married to person B, we should also have a statement that person B is married to person A.

[?personB, :marriedTo, ?personA] :-
   [?personA, :marriedTo ?personB].

RDFox Datalog includes several extensions such as filtering, negation and aggregation, to provide additional data analysis capabilities. A full description of these extensions is beyond the scope of this article, however the examples later will show these features in action.

So what’s a Datalog Constraint?

A Datalog Constraint is nothing more than an RDFox Datalog rule that derives instances of the constraint violation class <http://oxfordsemantic.tech/RDFox#ConstraintViolation> (or just rdfox:ConstraintViolation where the rdfox: prefix has been defined) into the default graph. Deriving instances of this class into the default graph requires including a triple pattern of the form:

[??, a, rdfox:ConstraintViolation]

in the rule head where ?? indicates that it does not matter what appears in the subject position. We’ll discuss the choice of what to substitute for ?? in the first example below.

Example 1 — Numeric Ranges

Let’s imagine we’re setting up our data store to hold test scores for a class of students. The maximum score for the test is 100 and we have decided to use the relation :testScore to record each student’s score. To enforce the constraint, we need a rule body that matches scores > 100. The following does just that using just a triple pattern and a filter literal:

[?student, :testScore, ?score],
FILTER ( ?score > 100 ) .

We now need to combine this with a rule head containing our template triple pattern from above giving:

[??, a, rdfox:ConstraintViolation] :-
   [?student, :testScore, ?score],
   ( ?score > 100 ) .

This is not yet a valid RDFox rule because of the ?? near the start of the first line which we must now replace. No matter what we replace it with, be it one of the variables from our rule body or even a constant, the resulting constraint will prevent test scores of more than 100 from being committed to the data store. So why not take one of those easy options? The answer is to do with the usefulness of the error message users will see when they try to add scores greater than 100.

When a transaction commit fails due to the presence of constraint violations, RDFox will include details of up to ten of those violations in the error message it returns. By ensuring that we use a different individual to represent each separate violation, we gain the benefit of more helpful messages when the constraint is violated.

With this in mind, we’ll use the rdfox:SKOLEM built-in tuple table to create a new individual to represent the constraint violation and bind it to the variable ?v. We can then save useful values from each violation instance by associating them with the ?v variable so that they will be included in any error message.

The completed constraint is:

PREFIX : <http://tests.example#>
PREFIX rdfox: <http://oxfordsemantic.tech/RDFox#>

[?v, a, rdfox:ConstraintViolation],
[?v, :constraintDescription, “Maximum test score is 100.”],
[?v, :student, ?student],
[?v, :actualScore, ?score] :-
   [?student, :testScore, ?score],
   FILTER ( ?score > 100 ),
   rdfox:SKOLEM(“MaxScoreExceeded”, ?student, ?score, ?v) .

With this constraint in place, importing the following Turtle:

@prefix : <http://tests.example#> .:student1 :testScore 93 .
:student2 :testScore 77 .
:student3 :testScore 103 .
:student4 :testScore 95 .
:student5 :testScore 100000000 .

results in the error message:

The transaction could not be committed because it would have introduced 2 violations.
The violations are listed below.

_:__05TWF4U2NvcmVFeGNlZWRlZAA-_02aHR0cDovL3Rlc3RzLmV4YW1wbGUjc3R1ZGVudDMA_17ZwAAAAAAAAA- <http://tests.example#actualScore> 103;<http://tests.example#student><http://tests.example#student3>;<http://tests.example#constraintDescription> "Maximum test score is 100." .

_:__05TWF4U2NvcmVFeGNlZWRlZAA-_02aHR0cDovL3Rlc3RzLmV4YW1wbGUjc3R1ZGVudDUA_17AOH1BQAAAAA- <http://tests.example#actualScore> 100000000;<http://tests.example#student><http://tests.example#student5>;<http://tests.example#constraintDescription> "Maximum test score is 100." .

This clearly tells us that among the data we tried to commit were two violations of the constraint “Maximum test score is 100.”, each showing the student and outsized score they relate to. Mission accomplished!

Example 2 — Mandatory Properties

In this example we’re setting up a data store to contain a mailing list using the foaf vocabulary. We want to ensure that we have at least one foaf:mbox property for every foaf:Person in the data store. The foaf:mbox property records a person’s email address. Instances of foaf:Person without such a property are just polluting our data store given its intended purpose.

Again, we need a rule body that will match subgraphs which violate the constraint. In this case we need to use negation to ensure that our rule body only matches where foaf:mbox is missing. We load the following prefixes and rule.

PREFIX : <http://example.com/>
PREFIX foaf: <http://xmlns.com/foaf/0.1/>
PREFIX rdfox: <http://oxfordsemantic.tech/RDFox#>

[?v, a, rdfox:ConstraintViolation],
[?v, :emailMissingFrom, ?person],
[?v, :constraintDescription, “Email address required.”] :-
   [?person, a, foaf:Person],
   NOT EXIST ?mbox IN [?person, foaf:mbox, ?mbox],
   rdfox:SKOLEM(“NoEmailAddress”, ?person, ?v) .

Now when we try to import the following Turtle:

@prefix : <http://example.com/> .
@prefix foaf: <http://xmlns.com/foaf/0.1/> .

:alice a foaf:Person ;
       foaf:name “Alice” .

:bob a foaf:Person ;
    foaf:name “Bob” ;
    foaf:mbox “mailto:bob@example.com” .

we receive the error message:

The transaction could not be committed because it would have introduced the following constraint violation:

_:__05Tm9FbWFpbEFkZHJlc3MA_02aHR0cDovL2V4YW1wbGUuY29tL2FsaWNlAA--

<http://example.com/constraintDescription> "Email address required.";
<http://example.com/emailMissingFrom><http://example.com/alice> .

Here we see that our Datalog constraint really is pinpointing the problematic part of the data in the transaction: there is no violation relating to Bob’s node as that has a foaf:mbox property.

Example 3 — Higher-level Invariants

The previous examples showed how to add constraints that apply locally, either to an individual relation or class, but it’s also possible to write constraints over entire collections using aggregation. To illustrate this, we’ll show how we ensure that a small sushi bar with 5 seats can’t be overbooked.

In the data store for our booking application, we have a different identifier for each hour-long slot. Customers make their reservations by creating a link from themselves to their desired slot using the :hasBooked relation. Our occupancy constraint is protected by the following rule which uses an AGGREGATE literal to count the number of bookings in each slot:

PREFIX : <http://booking.example/>
PREFIX rdfox: <http://oxfordsemantic.tech/RDFox#>

[?v, a, rdfox:ConstraintViolation] :-
   AGGREGATE([?customer, :hasBooked, ?slot]
       ON ?slot
       BIND COUNT(?customer) AS ?totalForSlot ) ,
   FILTER( ?totalForSlot > 5),
   BIND(
       CONCAT(
           "Maximum occupancy is 5 but ",
           STR(?totalForSlot),
           "people have booked slot ",
           STR(?slot),
           ".")  
       AS ?v ) .

Unlike the rules shown in the earlier examples, instead of using the rdfox:SKOLEM function to create the actual violation, this rule uses CONCAT to compute a natural-language description of the violation and uses this as the violation instance. Since the property common to all constraint violations ([?v, a, rdfox:ConstraintViolation]) is filtered out of the properties printed in error messages, the messages returned as a result of this constraint will contain just the human-readable message itself.

With the above rule loaded, we’re ready to start taking bookings. First of all, a party of five book an eight o’clock slot at the restaurant in a single transaction which is accepted:

@prefix : <http://booking.example/> .
@prefix slots: <http://booking.example/slots/> .

:Alice :hasBooked slots:MatsumotoSushi_20200710_8pm .

:Bob :hasBooked slots:MatsumotoSushi_20200710_8pm .

:Charlie :hasBooked slots:MatsumotoSushi_20200710_8pm .

:Dave :hasBooked slots:MatsumotoSushi_20200710_8pm .

:Eve :hasBooked slots:MatsumotoSushi_20200710_8pm .

When a sixth person tries to join the party however:

@prefix : <http://booking.example/> .
@prefix slots: <http://booking.example/slots/> .

:Frank :hasBooked slots:MatsumotoSushi_20200710_8pm .

the system gives a clear, human-readable message describing the problem:

The transaction could not be committed because it would have introduced the following constraint violation:

"Maximum occupancy is 5 but 6 people have booked slot http://booking.example/slots/MatsumotoSushi_20200710_8pm." .

This demonstrates the versatility of Datalog Constraints when it comes to providing helpful feedback.

Conclusion

Datalog Constraints bring the full power of RDFox’s best-in-class reasoning capabilities to bear on the problem of constraining data store content. We look forward to seeing what the growing community of developers working with RDFox will build with them!

To request an evaluation license click here. For further information on Datalog Constraints, see the documentation or head to our website.

...

The Team and Resources

The team behind Oxford Semantic Technologies started working on RDFox in 2011 at the Computer Science Department of the University of Oxford with the conviction that flexible and high-performance reasoning was a possibility for data-intensive applications without jeopardising the correctness of the results. RDFox is the first market-ready knowledge graph designed from the ground up with reasoning in mind. Oxford Semantic Technologies is a spin-out of the University of Oxford and is backed by leading investors including Samsung Venture Investment Corporation (SVIC), Oxford Sciences Innovation (OSI) and Oxford University’s investment arm (OUI). The author is proud to be a member of this team.