Back to The Blog

How Rules Impact Queries

Realising the knowledge in your data

Realising the knowledge in your data

First Published by Towards Data Science. Edited for our own readership.

An application’s logic for processing and manipulating data is typically controlled by an application or logic layer which sits between the database and the presentation. This formulates the requests which must then comply with the database’s structure. The following diagram represents the classic three tier architecture on which much of the world’s IT systems are built.

https://en.wikipedia.org/wiki/Multitier_architecture#Three-tier_architecture

However, knowledge graphs propose a paradigm shift to this design blurring the barrier between logic and data. By bringing some of the knowledge of the domain into a graph through rules a knowledge graph captures more than just the data in the system. As a result, rules can make the queries and requests much simpler to write and manage which in turn allows applications to be more flexible, less error prone and faster.

This article will introduce a simple example to showcase the impact of rules on query design. The example will be illustrated on RDFox, a high-performance knowledge graph and semantic reasoning engine developed by Oxford Semantic Technologies.

Introducing rules

A rule is a logical statement which scans the graph for data patterns which match the rule.

Consider the following graph rendered in RDFox’s console:

A fragment of the original data imported for this example.

It is easy to determine from the graph that Douglas Adams and Charlie Chaplin were born in the UK and that they are therefore UK comics. In a classical tiered approach, one might write logic in a middle tier to query the base graph directly in its original form. This query is by definition going to be more complicated than if we could query a knowledge graph directly for all UK comics.

The first rule works on the subClassOf relationship and for our data this makes the direct type relationship between Douglas Adams and Charlie Chaplin to Comic.

[?x, a, ?z] :- [?x, a, ?y], [?y, :subClassOf, ?z] .

Next we introduce the UK comic concept to the graph. To add this concept, we can use a simple rule in Datalog to materialise the UK Comic relationships:

[?x, a, :UKComic] :- [?x, a, :Comic] , [?x, :born_in, :uk] .

Which translates to:

If ?x is a :Comic and ?x is born_in :uk then ?x is a :UKComic.

Materialising rules

RDFox will scan the datastore for triples which satisfy the body of the rule. RDFox will add ?x a :UKComic triples to the graph whenever the pattern following the “:-” symbol is satisfied and will stop when all the UK comics have been found.

The result is the following graph:

The same fragment as shown earlier here showing the additions made by the rules. In particular direct connections to the new UKComic concept.

Using RDFox the materialisation of these triples by the rules happens as soon as the rules are imported into the datastore or incrementally whenever new data points are added to the datastore. For example, a new comic born in the UK would automatically be tagged a :UKComic when added to the graph.

How do rules help queries?

In the first graph, to fetch the UK Comics, a query would first have to identify the entities born in the UK and then identify the Comic entities and then return the entities present in both answers. The query can be expressed the following way in SPARQL:

The original query run against the source data.

With the UK comic rule, the query to answer the same question is much simpler because it is only searching for the :UKComic entities:

A simplified query that uses the new :UKComic concept.

We can also notice that the simplified query was faster to execute. This becomes more noticeable on larger datasets and more complex queries.

Rules help by expanding the original data in a consistent and managed way: as new data is added the same rules fire. Importantly if data is removed the reverse is also true and the consequences of previously fired rules are undone.

Modelling commonly searched concepts with rules can, therefore, help answer the questions faster but they also make the middle tiers easier to maintain thanks to simpler queries. Simpler queries allow applications to be more flexible.

Most queries can be modelled by the IF part of a rule which means that many existing applications could become faster and more flexible with a knowledge graph. If you have a situation today where your current architecture cannot supply the query performance required by your users why not use rules to push the logic about your data into a knowledge graph. Try writing queries in combination with rules using RDFox, you can sign up for a trial today!

To learn more about knowledge graphs, read our article on the intuitions behind knowledge graphs and reasoning.

...

The Team and Resources

The team behind Oxford Semantic Technologies started working on RDFox in 2011 at the Computer Science Department of the University of Oxford with the conviction that flexible and high-performance reasoning was a possibility for data-intensive applications without jeopardising the correctness of the results. RDFox is the first market-ready knowledge graph designed from the ground up with reasoning in mind. Oxford Semantic Technologies is a spin-out of the University of Oxford and is backed by leading investors including Samsung Venture Investment Corporation (SVIC), Oxford Sciences Innovation (OSI) and Oxford University’s investment arm (OUI). The author is proud to be a member of this team.