What is a constraint?

In knowledge graphs, and databases more generally, a constraint ensures that data conforms to a certain structure, called a shape.

For instance, when storing data about persons, there might be some fields which are always needed (social security number(SSN), date of birth, etc.), and others that may or may not apply (their spouse’s social security number, their children’s date of birth, etc—they may not have either).

We can impose a constraint to ensure that no person is store who does not have a social security number, and we will be able to tell our database to not accept any data that violates the constraint.

In knowledge graphs, the most widely used language for constraints is SHACL (Shapes Constraints Language). See the W3C definition for a technical outline of SHACL.

This essentially lets us specify shapes that the data should conform to, and then run a check to verify this. Here is an example:

:PersonShape a sh:NodeShape ;
sh:targetClass :Person ;
sh:property [
sh:path :hasSocialSecurityNumber ;
sh:maxCount 1 ;
sh:minCount 1
] .

This tells us that every member of the class :Person must have exactly one object for the :hasSocialSecurityNumber predicate.

We can go further however, with a shape like this one, which also tells us what datatype the SSN should be, and what it should look like:

:PersonShape a sh:NodeShape ;
sh:targetClass :Person ;
sh:property [
sh:path :hasSSN ;
sh:maxCount 1 ;
sh:minCount 1 ;
sh:datatype xsd:string ;
sh:pattern "^\\d{3}-\\d{2}-\\d{4}$"
] .

i.e. 3 digits at the start, followed by a hyphen, then 2 digits, then another hyphen, and finally 4 digits at the end.

If we validate the data against these shapes, we will be able to also find out which constraints were violated.
So if Alice had no SSN, then the validation report would tell us so.
If Bob has two SSNs, similarly, this reason would be flagged.
And if Charlie had one SSN, but it looks wrong, e.g. “12345-88”, then this would also be clear from the validation report.

Down arrow icon.