I’m Professor Ian Horrocks, I’m the leader of the data and knowledge group at the University of Oxford Computer Science Department. Personally, I’m working mainly in the area of knowledge representation and reasoning. I won the Lovelace medal in recognition of my work in that area, particularly for the development of algorithms and reasoning systems for expressive description logics and the standardization of the OWL ontology language based on said description logics.
First of all, congratulations! To win the Lovelace Medal is an incredible achievement and solidifies your place in computing history next to those who have previously received the award; world wide web inventor Sir Tim Berners-Lee, Linux creator Linus Torvalds, and information retrieval pioneer Karen Spärck Jones, to name just a few.
What does it mean to you not only to have won such a prestigious award, but now to be recognised among the greats?
I don’t know how other people feel when they win awards, but I always feel a little bit of imposter syndrome, you know, really? Me? I mean, surely there must have been someone who did more significant and important things than me, but no. It’s a great honour, I’m very happy to have won it. I’d also like to acknowledge the fact that, and I’m sure it’s the same with all of these things, that they have to pin the medal on some particular person but actually the work is really an accumulation of contributions by many people. The description logic community and the semantic web community and so on, it took all of us working together to achieve the result.
Your work, and the work of those around you, is just incredible. I think the rest of us agree, it’s more than worthy of this accolade.
Was there a moment in the process when you realised the significance of reasoning technology? Did you realise how important it would be, how integral it would be to so much of what we use today?
To be honest, I don’t think there was really a moment. It’s hard to see these things when you’re on the inside. I remember for a long time actually; it was a huge uphill struggle to sway people that knowledge representation and reasoning could be useful to them. In my old days I worked a lot in medical informatics and many of the medics we were working with back then were very much in the older generation, and persuading them to move away from pencil and paper was an uphill struggle. Electronic patient records and so on were in their infancy back then and many existing medics didn’t really like the idea. So, persuading them not only to enter data into the computer but then to use a sort of reasoning system to try to ‘do their job for them’ as they saw it, no, it was really an uphill struggle. You know knowledge representation, and reasoning in particular, had a bad reputation from the old days of early AI when ludicrous promises were made about what might be possible, so it was really an uphill struggle, and we were working for a long long time. Then suddenly you wake up one day and, ‘Oh!’ Actually, it turns out everyone seems to be using it now.
In more recent time I’ve really been amazed that when I go and talk to people in industry for example, I often discover they’re using semantic technology and it’s not even really a big deal for them, it’s just the standard thing that people do these days. You never even knew about it because they can download things from the web that just work and solve problems for them, they don’t need to come to us and complain about their problems.
So, long story short, there wasn’t a moment. It was hard at the beginning, now it seems to be easy. Somewhere in the middle there, there must have been a moment when it flipped over but I’m not quite sure when that was exactly.
The standardization of OWL, that was a huge thing, because it repaired the fragmentation of the space and got industry interested, but it wasn’t until later that we realised. It made a huge difference because before that, every university group that was interested in AI, invented their own language and had their own system and so on, so it was a really fragmented landscape.
It’s interesting that you bring up the challenges you faced because almost one year ago to the day we asked you what you thought were the biggest challenges for semantic reasoning. You said:
“Scalability in the sense of going beyond a single machine, using cluster technology or offline storage.”
Do you still agree with that? Do you think any progress has been made?
That is still a challenge, but I’ve changed my mind a little bit in the meantime because actually with today’s large scale machines, running most applications on a single machine doesn’t really seem to be such a huge limitation. I think the bigger problem that we’ve really run into in practice, and that is a long-standing problem with KR [knowledge representation], is knowledge capture. How do you actually get, not just data, but more complex knowledge — expert knowledge? How do you capture that and transform it into a formalized language like OWL that actually enables machines to reason over it? It’s always hard. I think one of the reasons KR has become more accepted in recent years is because there have been huge advances in that area as well. Natural language processing, and the [world wide] web, and the availability of information that’s out there now, compared to back in the early days when it was all just typed in by human beings, specifically for the purpose of the KR projects in question. Whereas now you can go out on the web and, well, you even get a bit swamped by information, but at least you’ve got some grist to the mill. It’s still challenging, and getting knowledge represented at really good quality so you can really rely on it is still quite difficult. I would say that’s one of the big challenges that lie ahead, and a lot of people are working on that, including our group. Hopefully we’ll make some more progress over the next several years.
This award was presented to you in recognition of your outstanding academic work over the last decades — evidently an area in which you excel.
So why did you decide to spin-out of the university and create RDFox? Did you see there was a need for it, or did you just have an incredible idea and a desire to see it through?
A bit of both really. Within the group and for me personally, seeing the theories and algorithms that we developed go right through to the point of implementing practical systems that can solve real problems that people have will always be a big motivation for the work. And we’ve had some successes in the past with systems being used. Boris’ (Professor Boris Motik) system is quite widely used, the ELK system is used a lot in large scale medical terminology development and maintenance. With RDFox, we could see the potential for a system that had a much broader appeal for industry application, and we got quite a few people in industry trying it out. But we could see that there was no way that they could really develop a critical path application depending on a university supported system. They never know when we might just get interested in something different and leave that to fall by the wayside with no real support. So we realized that if we were going to go the next mile and see it being used in really serious applications, then we’d need a company where there were a team of engineers that would build the enterprise level infrastructure around the core engine, and offer support ongoing over the next several years. That’s why we decided OST [Oxford Semantic Technologies] was the way to go.
With those foundations, you could have done so many things, and there are competitors to RDFox who have chosen a different path.
Why did you decide RDFox needed to be the way that it is? What makes RDFox special in your eyes?
I think the big difference in RDFox, is that RDFox started out with the theory, the algorithms. What are the problems that we’re trying to solve? What algorithms do we need for that? Let’s design them in a proper way, and prove that they really do what they’re supposed to do. Then we’ll build a system based on those algorithms and I think that really comes out in the product. No software, or no complicated software like RDFox, is totally bug free, but when we do have bugs it's minor things. It’s not fundamental algorithmic things where we suddenly realise, as the result of some bugs that users find, ‘oh my God, you know the algorithm is totally wrong.’ It’s always just minor things. I think that’s the huge difference.
The other systems, a lot of the other systems that are out there anyway, were really built in a completely different way, more like traditional software engineering. People just come up with relatively ad hoc algorithms where they can’t prove that the algorithm is doing the right thing, then they build the system, then they find the bugs. They try to plug all the holes and it’s an ongoing battle. I think it’s really hard to build a reliable reasoning system that way because it’s so complicated, everything’s connected to everything else, there are so many moving parts. You’ve seen some of these third-party evaluations that some of our partners have done, where they tested a range of graph databases and knowledge representation systems. You know, the other guys are just unbelievably full of bugs, and just giving wrong answers to queries, crashing left, right, and center and so on. RDFox just doesn’t do that. That’s because we went theory first, then built the system.
In terms of performance, again, a lot of that comes down to careful design at the beginning. Optimization and data structures, mixed with incredibly detailed thinking about the engineering. If you look at some of the data structures that Boris designed, they were there from the beginning. He knew that he wanted to use multi-threaded architectures to exploit the modern processor architecture. The Data structures were designed so that you could do parallel inserts without locking large parts of the data structure. In fact, most of the data structures allow for lock free insertion, so all of these threads can run while in parallel. As you know, that really bears fruit in the end. I know Peter [Crocker, OST CEO] often says he gets great satisfaction from running RDFox on some kind of large materialization task and sitting there, looking at the processor utilization stat as RDFox is running. All of the cores are just flat out. Just running to the max capacity. That’s a pretty impressive achievement actually for parallelizing the algorithm. The full gamut, from the theory, the algorithms, the data structures, right through to engineering and making sure that you don’t have the bottlenecks that prevent the threads that cause thread interference and so on.
Ian, thank you so much for your time and for answering my questions so thoroughly. It has been an absolute pleasure and an honor talking to you. Congratulations once again to you and those who you have worked with along this journey, what an achievement this is.
The team behind Oxford Semantic Technologies started working on RDFox in 2011 at the Computer Science Department of the University of Oxford with the conviction that flexible and high-performance reasoning was a possibility for data-intensive applications without jeopardising the correctness of the results. RDFox is the first market-ready knowledge graph designed from the ground up with reasoning in mind. Oxford Semantic Technologies is a spin-out of the University of Oxford and is backed by leading investors including Samsung Venture Investment Corporation (SVIC), Oxford Sciences Enterprises (OSE) and Oxford University Innovation (OUI).