coffee pourHow does a retailer make sense of a large amount of customer purchase data?  Collaborative filtering is one technique that mathematically segments data in to like or “collaborative” groupings.   Examples for online retail recommendations include filters such as “customer who bought this, also bought that” or “customers who looked at this item, also looked at these other items.”  Amazon originally promoted a simplistic form of collaborative filtering where a simple SQL query can show what other shoppers have looked at or also bought in connection with any given SKU. With plentiful data, this can be effectively generated for thousands of products.

What is collaborative filtering?

“Collaborative filtering is the process of filtering for information or patterns using techniques involving collaboration among multiple agents, viewpoints, data sources, etc. (Wikipedia)”

Think of a coffee filter, most adults remove one from their cupboard every morning to make their coffee. You can associate that to taking out your iPad to browse online. The water and coffee filter take millions of coffee grounds and filter them into a stream of coffee. The water is like your web traffic. And the coffee grounds are the SKUs on your website, all waiting to be touched. The stream of coffee allows for massive amounts of data, or coffee grounds, to be filtered into a finer stream of coffee in much the same way a collaborative filter creates a stream of products or content and delivers only those relevant to what the web visitor is looking to find.

The challenge with e-commerce data is the same as with coffee: sometimes there is not enough water (shoppers) to affect all of the coffee (SKUs).  In this case, there is not enough data to be relevant.  Just as it is possible to lack enough water to brew a decent cup of coffee, so too is it possible for there to be insufficient data from which to infer product recommendations. In this case, there are not enough SKUs, traffic to a particular item is too light or non-existent, or traffic itself is simply too light.  In this case, a more robust filter (or algorithm) is needed in order to create the associations between products and shopper behavior.

How much data does a recommendation system require?

What sort of data is required to deliver personalized product or content recommendations on an ecommerce site?  Clicks are not always an indicator that the user is looking for a product, and the click data can signal “noise” more than valuable engagement with a product.  However, with enough click path data, a collaborative machine learning filter will eventually find patterns and associations from which it can create associated groupings.  For many retailers though, 80% of their catalogue is rarely seen by web shoppers. For long tail items, sparse data and sparse associations make it difficult to bring them into the personalized recommendations mix.  In this case a more sensitive filter, or algorithm, is required in order to define patterns from sparse data.  Another option is to hand merchandise these types of products or to put them into a grouping of their own such as a sale items collection.  This last route is hard for merchandisers to maintain when catalogues are large, so some level of automation is crucial.

The final word on collaborative filtering

In its simplest form, collaborative filtering really works when data from multiple sources comes together and is sorted into categories. It is a must these days for any e-commerce site striving to deliver a basic level of website personalization.  But as we discussed, it is hard to rest on this one technique alone, especially if your site lacks the volume of data as the Walmart, Overstock or Amazon sites.  If the aim is to offer a personalized shopping experience, then retailers need personalization systems that can provide solutions in both data rich and data sparse situations.