The search feature of a website requires the creation of an index of the searchable fields. Creating an index requires adding some structure to the fields so that they can be easily categorised. The categories then enable the search results to be filtered.
For example, e-commerce websites provide a structure to help customers browse by category. However, the searchable categories tend to be fixed so the search results can’t be filtered with elements beyond the ones initially provided.
Let’s look at Amazon’s grocery section lists, popular categories. Selecting one will provide a new set of categories to help narrow the results
This process works well because the categories have been predefined. However, the category filtering can be a bit more restrictive when a customer performs a search because there are a virtually infinite number of terms they can search for.
For example, searching for “coffee” provides a new set of broad and specialised categories to filter the results with.
However, searching for “healthy apple biscuits” provides fewer categories and no specialised categories to choose from because the search is less frequent and harder to place in one of the predefined categories.
The current approach for most e-commerce website is therefore to define the categories in depth for the most searched terms. However, this does not scale very well and will reduce the overall quality of the user experience when they search for less frequent items.
Similarly, e-commerce sites are often inefficient at providing the correct query results when categories are selected. For example, those who engage in online shopping will be aware of the inefficiencies of search criteria, with frequent searches bringing up items in the wrong sizes, wrong styles or those which are out of stock, or searches may omit items from the results.
A more efficient and scalable approach is to use a technique called faceted search which will be illustrated in this article with RDFox.
Faceted search allows users to narrow down their search results by using the data about the listed items themselves instead of a predefined schema into which the data must fit. The users can then narrow down the categories of the results as narrowly as this data allows.
RDFox is a high performance knowledge graph and semantic reasoning engine, which operates in-memory. As it is an RDF triple store, data is stored in triples, which represents three linked data pieces, i.e. subject-predicate-object, often referred to as nodes (data point i.e. subject and object) and edges (the relationship between them).
Using this e-commerce example, information about the coffee would be modelled as triples. For example, a possible description of Nespresso’s Ispirazione Roma coffee could be:
Here I use two namespaces prop: and type: to talk about the properties and classes in my ontology (i.e. the data model), respectively. I also use the default namespace, : to talk about the entities in the graph, i.e. the data itself. Lastly I use some other ones that are standards in the industry, such as the rdf: and rdfs:.
Using separate namespaces is not strictly necessary, but can help with keeping a tidier knowledge graph.
We could then provide similar descriptions of each of the other coffees available. The faceted search would then offer us to filter on roasting, aroma and notes, price, blend etc. as initial categories to narrow down the search:
A user could then refine each category to provide, in real-time, the number of available Nespresso coffees which match all the filters:
The number of items displayed on the right is automatically updated whenever a new filter is ticked.
Furthermore, a user could refine the search by going deeper into the data, i.e. not just looking at the direct properties of the coffee itself, but also at those of the blend components.
Here we see Nespresso coffees made with blends that contain Arabica and have cereal and fruity notes:
The results can be narrowed down right to the last one by then selecting the rainy environment of the Arabica beans. This would enable coffee enthusiasts to find that perfect coffee!
The graph representation of Colombia can also be visualised with RDFox’s console:
Triples are expressive, and when combined with faceted search, can result in the provision of filters to users which are perfectly informed by the data.
Because the categories are generated automatically from the triples, very little development effort needs to go into maintaining the categories. This also reduces the risk of customers applying filters that yield no results or invalid results, e.g. items which are out of stock.
In short, from the customer perspective, this could drastically improve the user-experience on e-commerce sites, having knock on effects for customer satisfaction and return rates.
From the point of view of the data engineer, maintenance of the search interface and the data is much simpler. New triples — and therefore categories — can also be added using rules (e.g. single origin coffee above can be defined as coffee where there is only one component to the blend) extremely easily, see an article on rules here.
Furthermore, since RDFox is a graph database, it is perfectly suited to generating recommendations for users based on their view and purchase histories. We will be publishing an article on RDFox and recommendation engines on our Medium publication soon.
Faceted search with RDFox is therefore the perfect way to power a dynamic search feature on e-commerce websites. To find out how RDFox can power a great faceted search for your business, contact us at firstname.lastname@example.org, or click here to request an evaluation license.