Relational databases, despite their name, are not very good at expressing relationships in data because they impose a storage structure that does not favour connections.
Graph databases, despite their name, look nothing like a graph and offer greater flexibility for expressing relationships between data points. These relationships can be of any type and are represented as edges between the data points.
Storing relationships in a graph is interesting but not very useful for extracting knowledge and insights. A knowledge graph is therefore composed of a graph database to store the data and a reasoning layer to search and materialise patterns in the data.
This article introduces the basic concepts and intuitions behind knowledge graphs and reasoning on resource description framework (RDF) graphs with examples demonstrated on RDFox, a high-performance knowledge graph and semantic reasoner.
The RDF data model requires data points to be expressed in triples in the form: subject-predicate-object. The predominant query language for RDF graphs is SPARQL.
Reasoning in RDF is the ability to calculate the set of triples that logically follow from an RDF graph and a set of rules. Such logical consequences are materialised in RDFox as new triples in the graph.
The use of rules can significantly simplify the management of RDF data as well as provide a more complete set of answers to user queries. Consider, for instance, a graph containing the following triples:
The relation :located_in is intuitively transitive—from the fact that Oxford is located in Oxfordshire and Oxfordshire is located in England, we can deduce that Oxford is located in England. However, the triple :oxford :located_in :england is missing from the graph, and a SPARQL query asking for all English cities will not return Oxford as an answer.
We could, of course, add the missing triple by hand to the graph, thus making sure that Oxford is included in the list of English cities. However, doing so has a number of important disadvantages.
In particular, if we add to the graph the triple :england :located_in :uk we should derive the following triples as logical consequences of the transitivity of the :located_in relation:
We can use a rule to faithfully represent the transitive nature of the relation.
Such a rule would state that,
(any) object ?x in the graph is connected by :located_in to an object ?y,
?y is in turn connected by :located_in to an object ?z,
?x must also be connected by :located_in to ?z.
Here, ?x, ?y, and ?z are variables which can bind to any object in the graph.
In particular, such a rule can be written in RDFox’s rule language as follows:
The rule establishes a causal relation between different data triples;
triple :oxford, :located_in, :england holds because triples :oxford, :located_in, :oxfordshire and :oxfordshire, :located_in, :england also hold.
Assume that we later find out that :oxford is not located in :oxfordshire, but rather in the state of Mississippi in the US, and we delete from the graph the following triple as a result:
Then, the triples :oxford, located_in, :england and :oxford, :located_in, :uk must also be retracted as they are no longer justified.
Such situations are very hard to handle by simply adding and/or deleting triples; in contrast, they can be automatically handled in an efficient and elegant way by using rules in RDFox.
A rule language determines which syntactic expressions are valid rules, and also provides well-defined meaning to each rule. In particular, given an arbitrary set of syntactically valid rules and an arbitrary RDF graph, the set of new triples that follow from the application of the rules to the graph must be unambiguously defined.
Rule languages have been in use since the 1980s in the fields of data management and artificial intelligence. The basic rule language is called Datalog. It is a very well understood language, which constitutes the core of a plethora of subsequent rule formalisms equipped with a wide range of extensions.
A Datalog rule can be seen as an IF ... THEN statement. In particular, our example rule from earlier is written in Datalog:
Both the body and the head consist of a conjunction of conditions, where conjuncts are comma-separated and where each conjunct is a triple in which variables may occur.
In our example, the body is [?x, :located_in, ?y], [?y, :located_in, ?z] and the head is [?x, :located_in, ?z] .
Each Datalog rule conveys the idea that from certain combinations of triples in the input RDF graph, we can logically deduce that some other triples must also be part of the graph.
In particular, variables in the rule range over all possible nodes in the RDF graph; whenever these variables are assigned values that make the rule body become a subset of the graph, then we see what the values of those variables are, propagate these values to the head of the rule, and deduce that the resulting triples must also be a part of the graph.
In our example, a particular rule application binds variable ?x to :oxford , variable ?y to :oxfordshire and variable ?z to :england, which then implies that that triple :oxford :located_in :england obtained by replacing ?x with :oxford and ?z with :england in the head of the rule holds as a logical consequence.
A different rule application would bind ?x to :oxfordshire, ?y to :england, and ?z to :uk; as a result, the triple :oxfordshire :located_in :uk can also be derived as a logical consequence.
An alternative way to understand the meaning of a single Datalog rule application to an RDF graph is to look at it as the execution of an INSERT statement in SPARQL, which adds a set of triples to the graph. In particular, the statement corresponding to our example rule leads to the insertion of triples:
There is, however, a fundamental difference that makes rules more powerful than simple INSERT statements in SPARQL, namely that rules are applied recursively.
Indeed, after we have derived that Oxford is located in England, we can apply the rule again by matching ?x to :oxford , ?y to :england , and ?z to :uk , to derive :oxford :located_in :uk—a triple that is not obtained as a result of the INSERT statement above.
In this way, the logical consequences of a set of Datalog rules on a graph are captured by the iterative application of the rules until no new information can be added to the graph.
It is important to notice that the set of logical consequences obtained is completely independent of the order in which rule applications are performed as well as of the order in which different elements of rule bodies are given. In particular, the following two rules are completely equivalent:
Knowledge graphs offer a variety of applications that are not always appreciated or known.
Knowledge graphs built using RDFox are particularly efficient at finding complex rule-based patterns in data on the fly or verifying that they don’t occur.
Rule-based patterns offer an intuitive way to encode domain expertise. Patterns can for example model how components should be assembled into a functioning product or the requirements a user needs to satisfy in order to move through a process.
Most responsive applications need to evaluate these rules on the fly which is typically impractical to perform with legacy reasoning engines at scale and within the desired response times. You can read here how RDFox helped Festo reduce the time to configure complex products from hours to seconds.
Knowledge graphs can also be used to turn chatbots into truly intelligent reasoning agents by providing a more flexible and coherent approach to storing knowledge. Rules can also help improve the interpretation of poorly formulated questions.
Another key use case for knowledge graphs is detecting cyclic relations in networks that represent undesirable behaviour such as fraud or insider trading. RDFox can seamlessly establish connections in networks by efficiently navigating transitive relationships. RDFox can automatically flag or prevent connections that shouldn’t exist regardless of the complexity of the network.
RDFox is a high-performance knowledge graph and semantic reasoner which can evaluate complex queries and rules at scale and on the fly.
RDFox overcomes the flexibility and performance limitations of classical databases and reasoning engines by being an in-memory RDF-triplestore optimised for speed and parallel reasoning.
The novel design and concepts underpinning RDFox were developed and refined at the University of Oxford over the past decade and have been mathematically validated in peer-reviewed research.
RDFox guarantees the correctness of its rule materialisations and query results which can be delivered at scale and on the fly on both production-grade servers and memory-constrained devices.
For more information on RDFox, to request a demo or start a free trial, click here.