Back to The Blog

Researching how human knowledge can be taught to machines

Professor Cuenca Grau — Reasoning Over Knowledge Graphs

Professor Cuenca Grau — Reasoning Over Knowledge Graphs

Photo by Marvin Lagonera on Unsplash

My research over the last fifteen years has focused on Knowledge Representation and Reasoning (KRR) — an area of Artificial Intelligence and Computer Science concerned with the representation of human knowledge in a symbolic, machine-interpretable way, and the effective manipulation by computer programs of this knowledge in combination with data.

For example, KRR studies the way to represent in a format that a computer can understand statements such as ‘every playwright is an author’, and ‘if a person is born in a town located in a given country, then this is the person’s country of birth’. Once such information has been unambiguously represented in a suitable language (usually a kind of formal logic), KRR systems can then be used to process data in a more intelligent way.

For example, if our data tells us that Douglas Adams is a playwright born in Cambridge and Cambridge is located in the UK, then a computer program would be able to automatically deduce that Douglas Adams is a UK-born author. The role of ‘reasoning’ is to algorithmically find out this implicit information from the data explicitly given and the represented domain knowledge.

In recent years, there has been an enormous interest in the development and deployment of so-called knowledge graphs — a way to store factual information (data) and knowledge as an interconnected network (known as a graph, in Computer Science jargon).

In a knowledge graph, data items are represented as nodes in the graph, whereas the relationships between data items constitute the edges of the graph. For instance, in our previous example, a knowledge graph could have a node for Douglas Adams, a node for Cambridge, and an edge labelled with the relationship ‘city of birth’ linking the former to the latter.

Graphs provide a very flexible format for representing data, which is well-suited for a wide range of applications, where more traditional approaches to data management (such as relational database technologies) are not easily applicable. A knowledge graph widely used in applications is Wikidata, which encodes the information available in Wikipedia in a graph containing over 80 million nodes and about one billion edges between them.

Major technology players are rapidly adopting knowledge graphs and using them in new and unexpected ways. Google has developed a knowledge graph with over 70 billion edges, which they use for question answering on the Web: try to type a question in Google such as ‘How tall is the Eiffel Tower?’ and you will get a direct answer, namely ‘300m, 324m to the tip’, which has been obtained by matching your question to Google’s knowledge graph.

Companies such as Google are aiming high: ultimately, all human wisdom, everything you may want to know about the World (can you imagine?) will be available in the knowledge graph at or fingertips, ready for innovative applications to exploit. And not only that, information about millions of products is being stored in knowledge graphs by companies such as eBay, graphs about anything you can imagine are being generated semi-automatically from websites, databases, and even text documents; a company called DiffBot has a knowledge graph with over one trillion (yes, with a ‘t’) edges, with 150 million new edges added every day!

As one can easily imagine, managing such gigantic graphs and querying them easily and efficiently is not an easy task. And this is where Knowledge Representation and Reasoning technologies can be very useful.

Source: Keble College Review

For instance, imagine that we have about 5,000 playwrights such as Douglas Adams in our knowledge graph. If we want all of them to be authors (and we certainly do!), we would need to add explicit edges in the graph connecting the node for each individual playwright to the node representing the concept of an ‘author’ in the graph; that is 5,000 edges to be manually added.

Not only that, if suddenly we notice a mistake in our data (maybe ‘John Smith’ is not a playwright after all) then we would need to also remove all the edges that depend on that mistake (that is, the fact that ‘John Smith’ is an author, which was only true because he was believed to be a playwright).

This is almost impossible to manage via user updates, or even programmatically. A much more convenient way would be to represent a rule stating that ‘every playwright is an author’; then, a specialised piece of software (a reasoner) would be able to interpret this rule and automatically add and remove the relevant edges from the graph where appropriate.

Reasoning automatically with thousands of rules and graphs containing billions of edges is a very challenging problem both from a research and technological perspectives. In fact, it was well-beyond the state of the art just about 10–15 years ago, when research systems where struggling to cope with graphs containing tens of thousands of nodes.

The situation, however, has changed dramatically in recent years. We now have systems that can return results to complex queries over graphs containing billions of edges in milliseconds. We also have systems that are able to manage and reason with complex sets of rules written in powerful rule languages, and to maintain their inferences on the fly as data is updated in the graph.

One of those systems is RDFox — a high performance knowledge graph and reasoning engine that was developed at the University of Oxford’s Department of Computer Science and which is now a commercial product developed and distributed by Oxford Semantic Technologies.

As a co-founder of Oxford Semantic Technologies, I am very proud of what has been recently achieved — to witness how a carefully thought through system can reason and answer queries almost instantaneously when applied to sophisticated rule sets and large-scale graphs with tens of billions of connections. As a scientist, it is an incredibly gratifying feeling to experience how fundamental, cutting-edge research, conducted in our Knowledge Representation and Reasoning Group at Oxford is now being used by applications we could only dream of just a few years ago.

...

About the author

Professor Bernardo Cuenca Grau is based at the University of Oxford, within the Computer Science Department. For a full bio please read the Meet the Founders article.

About the article

This article was originally published in the Keble College Review. Permission was given to re-publish the article by the College and Author on Oxford Semantic Technologies Medium Publication.


...

The Team and Resources

The team behind Oxford Semantic Technologies started working on RDFox in 2011 at the Computer Science Department of the University of Oxford with the conviction that flexible and high-performance reasoning was a possibility for data-intensive applications without jeopardising the correctness of the results. RDFox is the first market-ready knowledge graph designed from the ground up with reasoning in mind. Oxford Semantic Technologies is a spin-out of the University of Oxford and is backed by leading investors including Samsung Venture Investment Corporation (SVIC), Oxford Sciences Innovation (OSI) and Oxford University’s investment arm (OUI). The author is proud to be a member of this team.