Back to The Blog

The Do’s and Don’ts of Rule and Query Writing

RDFox, Datalog and SPARQL

RDFox, Datalog and SPARQL

Photo by Joanna Kosinska on Unsplash

Rules are the conduit for reasoning. They offer an expressive way to process and manipulate knowledge graphs and help you to write queries by bringing the intelligence layer closer to the data. It can be difficult to know where to start when creating your own, but it really doesn’t have to be that way.

The power of rules is derived from their simplicity and structured logic, properties that lend themselves to easy use—you just have to get over the initial teething pains. That’s what we hope to do for you here. We aim to provide you with some of the best practices of rule writing, that we’re sure will give you the boost in confidence you need to jumpstart your reasoning.

If that seems a little daunting even still, you can read our introduction to rules and reasoning, where we discuss how they are used to seek out patterns in data. Alternatively, if you’re looking for something more concrete, our documentation has a section about rules and reasoning that you may find helpful.

If you need somewhere to practice or follow along, you can always request a free trial of RDFox and get your hands dirty in a real environment.

...

What is a rule?

A rule is an ‘if-then’ statement. For example:

?x has uncle ?z if?x has parent ?y and ?y has brother ?z.

We express rules using datalog. Datalog is a declarative and formal logic-based programming language based on Prolog. Datalog uses the following format:

[?x, :hasUncle, ?z] :-
   [?x, :hasParent, ?y],
   [?y, :hasBrother, ?z].

The formula to the left of the :- operator is the rule head (the ‘then’ part) and the formula to the right is the rule body (the ‘if’ part).

Intuitively, a rule says “if [ ?x , :hasParent , ?y] , [?y :hasBrother ?z] all hold, then [ ?x , :hasUncle , ?z ] holds as well”.

What is a query?

A query is a request for data or information from a database. Queries retrieve and model data stored within RDFox. The query language used by RDFox is SPARQL, which is the RDF standard query language.

Queries can be typed or pasted directly into the shell. For example:

SELECT ?s ?p ?o WHERE { ?s ?p ?o }

This query would select all subject-predicate-object triples within RDFox.

Below are some tips and best practices for writing rules and queries with RDFox.

Do’s

Add all rules before the facts or add rules and facts in an arbitrary order but grouped in a single transaction. This will usually increase the performance of the first reasoning operation.

Make rule bodies as selective as possible to improve the performance of a rule. This helps reduce the number of matches in the body that then propagate to the head.

Use rules to materialise costly and/or frequently used sub-queries. We recommend experimenting with the trade-off between reasoning and query answering time to make queries simpler to write, maintain and answer using rules.

Store your rules in separate files by purpose and import them incrementally when they are required. Rules in RDFox materialise as soon as they are imported which is why we recommend.

Restrict variables within your rule body. If types exist within your data use these to.

Start small and build on what you have. In the case of rules, incremental retraction/addition means that you don’t have to worry about rebooting the whole system every time you change a rule.

Write queries before you write rules. The query will let you know how much data will be affected by the rule.

Test your rules. Test whether the query you wrote before the rule (see above point) and the rule return the same result.

This can be done in a query like:

SELECT ?person WHERE {
   {
       SELECT ?person (COUNT(?child) AS ?children) WHERE {
           ?person :hasChild ?child
       }
       GROUP BY ?person
   }
   ?person :materialisedNumberOfChildren ?number
   FILTER ?number != ?children
}

This will return the set of people where the numbers don’t match, so the query should return 0 results.

Don’ts

Don’t forget to define the type of a variable used in the rules. This won’t be an issue in most cases but can slow down performance if the total number of possible relations to evaluate is large. Example:

[?customer, :referral, “true”] :-
   [?customer, :has, ?referralLink].

The types of ?customer and ?referralLink aren’t defined in the rule which would have to verify all the :has relations. Defining the type helps reduce the number of matches in the body that then propagate to the head. A more appropriate solution would be to do:

[?customer, :referral, “true”] :-
   [?customer, :has, ?referralLink],
   [?customer, a, :CustomerType],
   [?referralLink, a, :ReferralLinkType].

Replace joins in filters with regular joins: Consider the following rule, which marks as similar all pairs of entities whose labels are indistinguishable modulo case sensitivity.

[?first, :similarTo, ?second] :-
   [?first, rdfs:label, ?first_label],
   [?second, rdfs:label, ?second_label],
   FILTER(LCASE(?first_label) = LCASE(?second_label)).

While correct, the above rule is very inefficient as it forces RDFox to compare the labels of all pairs of entities, which becomes infeasible for moderately large number of entities (e.g. on a dataset with 1M entities this will result in 1T comparisons). The issue above is that for every triple that matches the first atom RDFox must iterate through all triples that match the second atom and apply the filter condition accordingly. RDFox has no way of reducing the number of compared pairs of entities.

An alternative solution is to precompute the values on which the join is performed in a separate rule and use a regular join in a second rule as illustrated next.

[?entity, :lcase_label, ?lcase_label] :-
   [?entity, rdfs:label, ?label],
   BIND(LCASE(?label) as ?lcase_label).

[?first, :similarTo, ?second] :-
   [?first, :lcase_label, ?label],
   [?second, :lcase_label, ?label].

While more verbose, RDFox will evaluate the above program in time proportional to the number of similar pairs (as opposed to all pairs), which depending on the data could be the difference between terminating and not. The first rule simply computes the lower-case labels of entities, while the second rule uses a direct join on the computed labels to identify the similar entities. Because RDFox uses full indexing, it can efficiently identify, for every triple that matches the first atom, only the compatible triples that match the second atom. Note that this solution uses additional memory to store the relation :lcase_label, which could be a significant but also necessary overhead for solving the problem.

Avoid unnecessary Filter statements (in both rules and queries): you are effectively throwing away answers that you already spent (a lot of) time computing.

If you want to say that two variables of the same type should be equal, just call them the same thing.

Avoid cross-product blow-ups (a special case of the selective tip but might be worth mentioning specifically).

...

While this is far from an exhaustive list, we hope you come away feeling more comfortable approaching reasoning and rule writing. Like with any tool, mastery comes with practice so we don’t expect you to be an expert off the back of this alone, but even by taking the step to read this article, you’re on the path that will get you there.

We’ll keep this article up to date with any new tips we find particularly helpful so feel free to bookmark it and come back when you’re feeling a little stuck.

Do you have any best practices to add to our list? Or situations to avoid? Feel free to get in touch! We will continue to update this article.

To request an evaluation license click here.

...

The Team and Resources

The team behind Oxford Semantic Technologies started working on RDFox in 2011 at the Computer Science Department of the University of Oxford with the conviction that flexible and high-performance reasoning was a possibility for data-intensive applications without jeopardising the correctness of the results. RDFox is the first market-ready knowledge graph designed from the ground up with reasoning in mind. Oxford Semantic Technologies is a spin-out of the University of Oxford and is backed by leading investors including Samsung Venture Investment Corporation (SVIC), Oxford Sciences Innovation (OSI) and Oxford University’s investment arm (OUI). The author is proud to be a member of this team.