A back arrow icon.
RDFox Blog
A back arrow icon.
RDFox Videos

How to count and aggregate in SPARQL with a Knowledge Graph

How to count and aggregate in SPARQL with a Knowledge Graph
Thomas Vout

Next episode: click here

Hello and welcome to another episode of the introductory series in RDF Fox.


In this episode, we're going to be running through additional SPARQL queries, this time focused on the COUNT function.


If you're not sure how we've got to this stage, check out our earlier videos that go through the foundations of SPARQL and RDFox.


So let's have a look at our query here. You can see from the output next to SELECT that we're using a COUNT function.


We'll come on to the details of this once we've looked at the WHERE clause, but ultimately what we're trying to do here is to count the number of races that Lewis Hamilton has raced in.


So let's have a look at the WHERE clause. The first thing we're going to do is to find our variable driver that has the properties forename Lewis Hamilton and driver surname, sorry, driver 4 named Lewis and driver surname Hamilton. This is enough for us to specify that this subject here is our Lewis Hamilton node.  

From there though, we're going to start a new part of this pattern that looks at connecting the races to the drivers. Because of the shape of our data, we're going to have to go via an intermediary node, and so we're first going to have to look at the result and its relationship to our driver. Again, we're using this same driver variable to the one that we've just declared here, because of course we've already done the hard work in ensuring this really does represent Lewis Hamilton.


That's not quite enough though, yet. So from the result, we also have to find the associated race, which we can just do with the property result race and the variable race.  

From there though, we have all of the information that we need. We have Lewis and we have all of the races that he has raced in.


So all that's left for us to do is to actually count them. To do this, we head straight to the output of our select and in brackets we simply put count and then the variable of which we would like to count.


So this counts the instances of the race of the variable race, and we save the count of those instances as a new variable race count, which is ultimately the one that will be returned by the select.


So if we click run, we can see that we have a single result here that race count is 266. Now this value is accurate up until the year 2020, which will make sense if you'd followed the rest of this series so I highly recommend that you do exactly that.


So that's our first count query, but we have one more to look at that was the count of a specific property of a specific entity. Now let's get a little bit more general with it and say we would like to count every race count for each of our drivers for all of them. So this simplifies things actually in the where clause, because we no longer have to go to the effort of defining or narrowing down our driver node. We can simply leave this as a totally free variable with no constraints whatsoever.


So we just have our result, result driver, driver and result, result race, race to connect each driver with a race. We're then using the same count race as race count function, but we're also returning the drivers. This here is enough information to form our list, but we need to give RDFox some additional context. It doesn't know what a driver is nor what a race is.


So just by providing this information, it will count all of the races that have ever been raced by anybody. And that might be what we want, but in this case, we want to separate our race counts out per driver. So we have to add this additional line group by driver to tell RDFox how to group or how to separate the counts. In this case, we want them each per driver.


And finally, we're going to add this final line just to give us a more human readable list order by descending race count, which will order our list starting from the highest race count down to the lowest.


So if we click Run SPARQL, we will see the list of our results here, starting with Raikkonen at the top with a race count of 332.


If you'd like to find out more about the SPARQL and additional functions that we can use, check out our other videos on the topic.

Take your first steps towards a solution.

Start with a free RDFox demo!

Take your first steps towards a solution.

Get started with RDFox for free!

Team and Resources

The team behind Oxford Semantic Technologies started working on RDFox in 2011 at the Computer Science Department of the University of Oxford with the conviction that flexible and high-performance reasoning was a possibility for data-intensive applications without jeopardising the correctness of the results. RDFox is the first market-ready knowledge graph designed from the ground up with reasoning in mind. Oxford Semantic Technologies is a spin-out of the University of Oxford and is backed by leading investors including Samsung Venture Investment Corporation (SVIC), Oxford Sciences Enterprises (OSE) and Oxford University Innovation (OUI).