Machine learning algorithms are now synonymous with finding patterns in data but not all patterns are suitable for statistics based data-driven techniques, for example when these patterns don’t have explicitly labelled targets to learn from.
In some cases, these patterns can be expressed precisely as a rule. Reasoning is the process of matching rule-based patterns or verifying that they don’t exist in a graph. Because these patterns are found with deductive logic they can be found more efficiently and interpreted more easily than Machine Learning patterns which are induced from the data.
This article will introduce some common patterns and how you can express them in the rule language, Datalog, using RDFox, a knowledge graph and semantic reasoner developed by Oxford Semantic Technologies. RDFox is a standards based, RDF-triplestore which we will query with SPARQL. If you are not yet familiar with knowledge graphs and reasoning, read our article The Intuitions Behind Knowledge Graphs and Reasoning to be brought up to speed. If you'd like to try RDFox out for yourself, request a free trial and see the RDFox getting started guide to get up and running.
The “located in” relation is intuitively transitive but might not be completely expressed in the graph. For example, a graph might contain the following triples:
A SPARQL query will not return that Oxford is located in England because it is missing the triple:
This triple could be added on a one-off basis, but this would soon become impractical on larger graphs where other towns located in Oxfordshire might also be missing the located in England edge.
In this case, a transitive relation rule can effortlessly draw the relevant :located_in edges automatically:
The first part is the head of the rule which will materialise ?x :located_in ?z triples if the pattern following the symbol :- is found in the graph.
In our example, the rule will bind the variable ?x to :oxford, variable ?y to :oxfordshire and variable ?z to :england, and then as a logical consequence of the rule being satisfied, create the :oxford :located_in :england triple by replacing ?x with :oxford and ?z with :england.
RDFox will materialise the rule incrementally on the fly whenever new data points are added or removed from the graph which makes it an efficient solution for dynamic data sources.
Transitive closure patterns help materialise transitive relations which don’t yet exist in the graph but could potentially. For example, some Twitter followers can be represented in the following graph:
When recommending follow suggestions to the users we might wish to compute the missing connections as :follows_closure edges using the following rules:
The first rule specifies that the new relation :follows_closure is an extension of the relation :follows. The second rule implements the closure by saying that if a person ?x directly follows ?y and ?y (directly or indirectly) follows person ?z, then ?x (indirectly) follows ?z.
The new :follows_closure relations which were not originally a :follows relation are therefore:
These simple rules can be enhanced to include user interests, geography, language, common followers etc.
An important practical use of knowledge graphs is to power Open Query Answering (Open QA) applications or chatbots, where the user asks a question in natural language, which is then automatically answered against the graph. Open QA systems often struggle to interpret questions that involve several “hops” in the graph. For instance, consider the graph consisting of the triples given next.
A user may ask in which country Douglas Adams was born. To obtain this information, the system would need to construct a query involving two hops in the graph. In particular, the SPARQL query
would return :uk as an answer.
The efficiency of the open QA system is greatly enhanced if the desired information had been available in just a single hop. RDFox rules can be used to provide a clean solution in this situation. In particular, we can use rules to define a new :country_of_birth relation that provides a “shortcut” for directly accessing the desired information.
The rule says that, if a person ?x is born in a place ?y, and that place ?y is a :country, then ?y is the country of birth of ?x.
As a result, RDFox would derive that Douglas Adams’ country of birth is the UK. The Open QA system would now only need to construct the following simpler query, which involves a single hop in the graph, to obtain the desired information.
As we have already seen rules can build on the results of other rules so this pattern combined with the the first, transitive relation, can cater for data which includes more the full detail of where Douglas Adams was born. Such as:
When the rules are applied to this data they produce the graph visualised below.
A common task in knowledge graphs is to identify cyclic relationships. For instance, partonomy relations are typically acyclic (e.g., if an engine is part of a car, we wouldn’t expect the car to be part of the engine as well!). In these cases, cycle detection may be needed to detect errors in the graph and thus provide data validation. Cycle detection is also useful for detecting fraud or insider trading by identifying for example communication relations in the network which shouldn’t exist.
Consider the following graph with “part of” relations:
The graph contains a cyclic path :piston -> :engine -> :car -> :piston. via the :part_of relation.
The relationship is naturally transitive and can be defined by the following rule:
The following SPARQL query will return the elements which are part of others (directly or indirectly)
Which gives us the following results:
A cycle manifests itself by the presence of self-loops (e.g. :piston is derived to be a part of itself).
Hence, it is possible to detect cyclic part of relations with the following SPARQL query.
Alternatively, we could define cyclic relations with the following rule:
Which tells us that if any object is determined to be a part of itself, then the partonomy relation is cyclic.
We can now easily retrieve the list of cyclic relations in the graph.
To obtain :part_of as a result.
Many relations naturally imply some sort of order, and in such cases, we might be interested in finding the first and last elements of such orders.
For instance, consider the managerial structure of a company.
We would like to recognise which individuals in the company are “top-level managers”. We can use a rule to define a top-level manager as a person who manages someone and is not managed by anyone else.
looks for the list of top level managers and gives :alice as the answer.
We can now use a rule to define “junior employees” as those who have a manager but who themselves do not manage anyone else.
The query for junior employees is then
This returns :monica and :david as answers.
This was a short introduction to a sample of rule-based patterns. More articles like this can be found on the RDFox blog, such as Datalog Basics and RDFox. Imagine what is possible with the combination of these with other rules and when run at scale over your data.
The team behind Oxford Semantic Technologies started working on RDFox in 2011 at the Computer Science Department of the University of Oxford with the conviction that flexible and high-performance reasoning was a possibility for data-intensive applications without jeopardising the correctness of the results. RDFox is the first market-ready knowledge graph designed from the ground up with reasoning in mind. Oxford Semantic Technologies is a spin-out of the University of Oxford and is backed by leading investors including Samsung Venture Investment Corporation (SVIC), Oxford Sciences Enterprises (OSE) and Oxford University Innovation (OUI).