Scott_brave_headshot_JPGAs I mentioned in the first part of this two part post, “Intelligence is the ability to generalize and learn useful patterns from data and/or experiences.”  Another common mistake when trying to train machines against big data sets is what’s known as over-fitting.  To explain this one, I need to take you back to high school algebra for a brief minute.  You may remember that any two points on a graph can be connected with a line.  You may also remember, that any number of points scattered across a graph can be perfectly fit by a polynomial curve if it’s allowed to be arbitrarily complex. 

In machine learning, the goal is to find the simplest curve that both fits the current data points but more importantly predicts where new as-yet-unseen data points will fall.  If the machine chooses a curve that fits all of the current data points well (usually by employing an overly sophisticated model) but then does a bad job of predicting new data, it has succumb to over-fitting.

I can’t tell you how many times I have heard people run through scenarios in their head like the following: “so you mean if I knew that Fred had searched flowers and that his mother’s birthday is next week, then I could predict that he’s looking for a gift for his mom?”  Probably not.  The problem is that the model you have constructed in your head fits perfectly the one or two examples you have in mind, but when you apply that same generalization to all of the other examples you’re not thinking of, it fails more often than it works.  You have succumbed to over-fitting and the result will be some good experiences and a lot of bad ones.

The lesson learned is to strive for intelligent generalization in your personalization strategy, not omniscience.