Here at Baynote we process many terabytes of data each day, putting us squarely in the domain of “big data.” This data comes from embedding tags on our customers’ web sites. These tags enable our customers to send us data from all of their users’ interactions with their site. We use this data to build predictive models of the users’ behavior, and then use these models to determine which recommendations to show.
One paradoxical aspect of the data collected to build these predictive models, (and this is true of all data collected to build models for predicting which recommendations to show) – is that despite the massive amount of data we collect, the data is sparse. Let me explain.
In a typical day, a large e-commerce site realizes in over 16 million records for 250,000 unique site visitors. A site may have a catalogue of several thousand items, each of which is a potential recommendation on each product page. So the number of product pages times the potential recommendations is of the order of several million; the same size as the data. Now, of course, some of the potential product-recommendation combinations are unlikely to be useful (no one wants to see a recommendation for a shower curtain when they’re looking for a suit), but we have to determine the useful combinations algorithmically.
Typically, internet advertisements are selected based on the click-through-rate, that proportion of users who were shown an ad and actually clicked on it. But this approach doesn’t help much for recommendations. Only a few recommendations are shown on each product page, so there is no click-through data for most potential recommendations on most product pages.
So, how do you produce good recommendations when you have zero data for most of the things you are interested in? This is where the 18th century mathematics comes in.
Thomas Bayes (1701-1761) was an amateur mathematician, whose most famous result, Bayes’ Theorem, quantifies the process of learning. He formalized the mathematics of taking a prior belief, and updating it when relevant data arrives. What this means for recommendations is that we can use data other than straightforward click-throughs to build models of how likely a user is to click on a recommendation that we’ve never shown to anyone before. Using these models, we show recommendations that are of high quality, and which only get better as we collect more data on how often users go on to click on them. And that’s the value of 18th century wisdom.