Next episode: click here
Hello and welcome to another episode of this introductory series into RDFox.
In this episode, we're going to be covering another set of SPARQL queries, this time focused on inner SELECT queries. Now, this episode does rely directly on content that we've covered before, specifically negation and counts, so if you're not familiar with those, go check out those episodes before this one. If you haven't covered the foundations of SPARQL in general or RDFox, those episodes are also available to you before this one so please do check those out first.
So let's have a look at our first query. What we're trying to do here is count the number of races for each of our drivers who have never finished on a podium, relying on that negation to find drivers without a podium and our counts to find each of their corresponding race counts.
The beginning of our pattern looks pretty simple. We're just finding our drivers and some information about them using the known relationships driver forename and driver surname. To find our forename and surname, then we're using the same FILTER NOT EXISTS function that we've seen before to exclude any drivers who have finished on a podium and then finally we find our inner SELECT query.
Now you might notice that this is just the same SELECT query that we saw earlier on. This is just the SELECT that counts the number of races that each of our drivers have written. However, you'll notice here that it is within a larger SELECT query and that we've done this by opening up a set of curly brackets within the wider SELECT. All this does is for each of the patterns that have been found in the wider selection, it performs this SELECT, this query for each of those and in this case it's going to be a count, so it's going to be nice and efficient.
So once we've found our new race count and passed this back to our wider function in the output of this select, we can simply use it as we would any other value. Here we're ordering by descending race count and returning the forename and the surname and the race count on our output, so this is going to be our drivers who have never finished on a podium and their corresponding race count.
So if we click ‘Run’ and reveal the results table, we can see a list of our drivers and their race counts. None of these people have ever finished on a podium, except some of you may be familiar with the sport Formula One, and you will know that Nico Hulkenberg famously has the longest streak ever, having never finished on a podium. But in 2025 he broke this dry streak and has now actually finished on a podium so he would no longer be in this list if we were using today's data. So congratulations to Nico, but anyone who's following along with the workshop will know that our data is only valid up until the year 2020 and there's a very good reason for that - you'll have to follow along with the rest of the workshop to figure out why.
So from here, let's go to our next query, Query 11. We're leaving podiums behind and now just focusing on race wins, specifically the percentage of wins that each of our drivers have; their win percentage.
This time we're going to use two inner selects to help us calculate this and even a bind function to form the calculation itself. So initially, just as we have time and time again, we are finding some properties about our drivers, crucially their forename and surname and then we're straight into the inner SELECTs.
The first is the exact same count race counts query that we've seen many times at this point, but the second is almost the same query again, this time with the additional constraint on the result position order to be '1'. So this is a slightly tighter pattern that we're describing, ensuring that any race, any result found here corresponds to a win. This time we can count our races and create this new variable race wins.
Now, once we have our race count and race wins returned to this larger query by providing them on the output of the inner selects, we can then use them in a bind. Now, what a bind does is it calculates any expression, whether that's a mathematical expression or a function, and it saves the result as a new variable. So bind is an incredibly powerful, incredibly versatile function. In this case, we're just trying to calculate our win percentage so we simply do race wins divided by race count and we save the result as percentage as should be familiar.
We also order by something sensible so that we can see a nice list at the end, so here we choose to order by percentage and of course we'll return a whole bunch of stuff on the output being the forename, surname, race count, race wins and win percentage. If we click ‘Run SPARQL’ and expose our results table, we can see a list of our race winners here alongside their win percentage.
Now there is a slight problem with this list and that is revealed if we scroll all the way down to the bottom and that is the last person in our list still has one race win. There are lots and lots and lots of drivers who have never won a race. A win percentage of 0 is very interesting, particularly if you're tracking this over time and you want to see if the value goes from zero to non 0 or perhaps goes from non-zero to 0. These are interesting metrics.
If we have a look at our query we can see why this happened because in our second inner SELECT we assumed the race result position order had to be 1 and this excluded anyone who did not match this pattern from the results. Ios not that they got a count of 0, it's that they got no count. They were not considered at all.
In our final query we can address that. As I said, this is our final query for the section, so let's run through it quickly as most of this is actually the same. You will see we're still finding our driver; we're still performing the same inner select on our race count.
In fact, our query, our second inner select query here for our raced wins is also the same but the key difference here is we have used the OPTIONAL keyword in front of the curly brackets that open this up. This simply means that this part of the query is optional. As RDFox is going through this finding the results, if a particular entity, a particular pattern matches this query then great, find a value for race wins and return that value. If, however, some values have matched with the rest of the query, but when they get to this optional part it doesn't conform to this pattern, then don't worry about it. Usually this would mean that those results were excluded, but this time, because it is optional, we don't have to worry. We simply leave race wins undefined. However, in our situation here, we're looking to calculate a win percentage, so it's no good having an undefined variable.
So, the first thing we have to do is to use another BIND and this time use the function COALESCE. What this does is it takes a variable and it looks at the value of this variable. If it has a well defined variable, it simply returns that value and so the same well defined value is saved as race wins final and that's it. However, if the value of this variable is undefined, it instead returns its second argument. We know the only way that race wins remains undefined is if the driver has had no wins, and so we provide this value here to be 0, and so coalesce will return 0 whenever race wins is undefined. So race wins final either is 0 or it is the actual count of wins for this driver.
Just like we did before we have to then actually calculate the percentage, so we do race wins final divided by race count and saving that as percentage. This time if we run our SPARQL and scroll up to see our results table, you'll see the top of this list looks exactly the same. That's because for these drivers who have race wins, nothing has changed.
But this time, if we scroll all the way down to the bottom, we will just see hundreds and hundreds of drivers who have never finished a race in first place, each of them with a win percentage of 0.
This is the final episode in the SPARQL section of this workshop for now. We may be bringing additional SPARQL tutorials in the future, so do look out for those, but for now, if you'd like to learn more about reasoning and how to take your queries to the next level, check out the Datalog and the OWL episodes to come.
The team behind Oxford Semantic Technologies started working on RDFox in 2011 at the Computer Science Department of the University of Oxford with the conviction that flexible and high-performance reasoning was a possibility for data-intensive applications without jeopardising the correctness of the results. RDFox is the first market-ready knowledge graph designed from the ground up with reasoning in mind. Oxford Semantic Technologies is a spin-out of the University of Oxford and is backed by leading investors including Samsung Venture Investment Corporation (SVIC), Oxford Sciences Enterprises (OSE) and Oxford University Innovation (OUI).