The Enliven team and I had the pleasure of competing in GlobalHack I in early 2014. Overall, it was a fun experience that I’ll remember as my first successful implementation of a perceptron machine learning algorithm. Although the math may seem daunting to the average developer who isn’t used to dealing with things like dot products, the basic perceptron algorithm is actually quite understandable.

My goal for this post is to cut to the core of the issue: how does one implement a quick, easy-to-use perceptron algorithm for one’s own, custom needs? I will go through concepts first and code second. The code presented at the end of this post was written from 4am-7am, so it may not be the most efficient and could certainly be expanded upon with ideas we’ll discuss in later posts. However, it should be a great resource for anyone looking to learn about this classic machine learning algorithm.

At its core, a perceptron answers a yes or no question based on past experience. In GlobalHack, we were tasked with predicting whether a sale would close based on some inputs. I’ll focus on four very simplified inputs for the sake of this post: pain (of current situation I presume), fit of product, success of demo, and dollar amount of project (specifically, how close the project is to our typical project). Each input will be ranked from 0% to 100% and will be represented by \rho, f, s and a respectively.

Weighted Averages

So let’s back up for a second and talk about a weighted average. I’m going to assume that we’re all familiar with a typical average:

\mu =\frac{(x_1+x_2+x_3+x_4)}{4}

For each equation I give, I’m going to also give a generic equivalent in code (C#-ish with some prettified math symbols). In this case:

average=\frac{sumOfInputs}{numberOfInputs}

If you remember your factoring from school, you’ll recognize that another way of writing the above equation is:

\mu =\frac{1}{4}(x_1+x_2+x_3+x_4)

average=\frac{1}{4}(firstInput+secondInput+thirdInput+fourthInput)

Distributing gives us:

\mu =(\frac{1}{4}x_1+\frac{1}{4}x_2+\frac{1}{4}x_3+\frac{1}{4}x_4)

average=(\frac{1}{4}firstInput+\frac{1}{4}secondInput+\frac{1}{4}thirdInput+\frac{1}{4}fourthInput)

Let’s go back to our example (closing a sale). If we think all inputs are equally important, then we should have something like this:

\mu =(\frac{1}{4}\rho+\frac{1}{4}f+\frac{1}{4}s+\frac{1}{4}\alpha)

average=(\frac{1}{4}pain+\frac{1}{4}fit+\frac{1}{4}success+\frac{1}{4}amount)

But what is the average in this case? If we rate pain, fit, success, and amount from 1 to 100, then it would be the average rating of the key inputs. It’s not too much of a jump to see that we may be able to make a prediction based off of this score.

Now what happens if we decide pain really isn’t that important and we’ve underestimated how important fit is? Well, we could make the coefficient (number in front of each input) of \rho equal to and the coefficient of f equal to \frac{2}{4}. We could now re-write each equation as follows:

\mu =(\frac{0}{4}\rho+\frac{2}{4}f+\frac{1}{4}s+\frac{1}{4}\alpha)

average=(\frac{0}{4}pain+\frac{2}{4}fit+\frac{1}{4}success+\frac{1}{4}amount)

Of course, fractions can be rewritten as percentages:

\mu =(0\%\times\rho+50\%\times f+25\%\times s+25\%\times\alpha)

average=(0\%\times pain +0\%\times fit +0\%\times success +0\%\times amount)

We now have a weighted average where the weights are 0%, 50%, 25%, and 25%. The way we’d represent this in typical machine learning parlance would be: w_1=0\%, w_2=50\%, w_3=25\%, w_4=25\%, if x_1=\rho, x_2=f, x_3=s, x_4=\alpha:

\mu =(w_1\times\rho+w_2\times f+w_3\times s+w_4\times\alpha)

predictive~rating =( painWeight \times pain + fitWeight \times fit + successWeight \times success + amountWeight \times amount)

Predictive Perceptron Algorithm

There are two main algorithms when it comes to the perceptron: a predictive algorithm and a training algorithm. Let’s consider the predictive algorithm first:

y_j=f[w(t) \cdot x_j] = f[w_0(t)+w_1(t)x_{j,1}+...+w_n(t)x_{j,n}]

*from Wikipedia

Yikes! That seems intense. But hold on - let’s break it down.

First of all, y_j means that we are naming the prediction y, and we are predicting for the jth time. For instance, if this were our fifth prediction we’ve ever made with this perceptron, j would be 5:

predictionOfFifthSale = f[w_0(t)+w_1(t)x_{5,1}+...+w_n(t)x_{j,n}]

Next, w(t) \cdot x_j is called a dot product. It means we sum each weight multiplied by its respective input. So in our example, the pain weight gets multiplied by the pain rating, and the fit weight gets multiplied by the fit rating, etc. Then we add them all up:

f[w(t)\cdot x_j]=f[w_0(t)+w_1(t)x_{5,1}+...+w_n(t)x_{5,n}]

f[w(t)\cdot x_j]=f[painWeight\cdot pain + fitWeight\cdot fit + successWeight\cdot success + amountWeight\cdot amount]

f[whatever] is a bit more complicated. First of all, remember that the prediction has to be a yes or no answer. The f and the opening and closing brackets in that expression tell us that this is a function that we’re going to nickname f. We could just as easily call this a, g or function. But f is more traditional in math. In code, we’d probably call this something like “getPrediction”:

fifthPredictiveInput=painWeight*painOfFifthProject+fitWeight*fitOfFifthProject+successWeight*successOfFifthProject+amountWeight*amountOfFifthProject

predictionOfFifthSale=getPrediction(fifthPredictiveInput)

A Note on \bf{w_o}

The getPrediction function typically needs a threshold above which it concludes that the sale is likely. This threshold is represented by w_0 in our original equation. We want this threshold to evolve just like every other part of our algorithm, so we have to include it as a weight:

public~bool~getPrediction(double~fifthPredictiveInput,double~threshold){

return(fifthPredictiveInput > threshold);

}

The reason w_0 is typically included as an input in our original predictive algorithm is so that the entire w(t) vector (basically, a list in C#) can train with the rest of the weights. More on that below.

Training Perceptron Algorithm

This is the meat and potatoes of the perceptron algorithm. Without training, you would just have a plain ol’ dull algorithm, not one that can learn.

The following is the predictive equation:

w_i(t+1)=w_i(t)+\propto (d_i-y_i)x_{i,j}

*from Wikipedia

Again: yikes! That’s a lot of math that some of you may not be comfortable with.

Conceptually, we’re looking at a whole big set of training data. We take this data one observation at a time, and for each observation, update the weights one at a time. The variables are as follows:

i: the index of the weight we’re currently updating.
w_i(t+1): the next incarnation of the ith (pain, for example) weight.
w_i(t): the current incarnation of the ith (pain, for example) weight.
d_j: the jth observation (for example, whether or not we actually closed the fifth sale). It’s represented by either a 1 (yes) or a 0 (no).
y_j: the jth predicted observation (for example, whether our predictive algorithm would predict we would close the fifth sale given the current weights). It’s also represented by either a 1 (yes) or a 0 (no).
x_{i,j}: the appropriate input for the jth observation. For example, when updating the pain weight for the fifth sale, we would look at the pain rating for the fifth sale.
\propto: the “training velocity”. This basically allows us to throttle how fast the algorithm is learning. If it’s too small relative to your weights, then your weights won’t end up getting to their optimal levels. On the other hand, if it’s too large relative to your weights, then your weights will constantly over-correct, and won’t stay on their optimal levels. The training velocity must be a positive number between 0 and 1.

Let’s update based on this new information:

w_i(t+1)=w_i(t)+\propto (d_i-y_i)x_{i,j}

Becomes (for the training for the entire fifth observation):

nextPainWeight = currentPainWeight + trainingVelocity*(wasSaleMade-wasSalePredicted)*painOfFifthProduct

And we’d continue to do this for each weight. The threshold weight is a special case (its rating is always 1):

threshold=currentThreshold+trainingVelocity*(wasSaleMade-wasSalePredicted)*1

Outcome Discussion

Let’s look at some possible outcomes and consequences of (wasSaleMade-wasSalePredicted).

If the sale was made and was also predicted, then this term evaluates to 1-1=0, so:

nextPainWeight=currentPainWeight

If the sale was not made and was also not predicted, then this term evaluates to 0-0=0, so:

nextPainWeight=currentPainWeight

If the sale was made, but not predicted, then this term evaluates to 1-0=1, so:

nextPainWeight=currentPainWeight+trainingVelocity*1*painOfFifthProduct

Basically, we end up deciding our weights aren’t aggressive enough, so we increment them by some fraction (decided by trainingVelocity) of the appropriate input, in this case painOfFifthProject.

If the sale was not made, but was predicted, then this term evaluates to 0-1=-1, so:

nextPainWeight=currentPainWeight+trainingVelocity*(-1)*painOfFifthProduct

This time, we basically end up deciding our weights are too aggressive so we decrease them by some fraction (decided by trainingVelocity) of the appropriate input, in this case painOfFifthProject.

Putting It All Together

Below is my code (modified for clarity and consistency) from the Hackathon. I’ve replaced most “var” declarations with its appropriate type for clarity.

public void Learn(Opportunity opportunity)
{
int actualOutcome = opportunity.IsWon ? 1 : 0;
List(double) weightList = Repository.GetWeightList();
List(double) ratingList = GetRatingList(opportunity);
List(double) newWeights = Train(actualOutcome, weightList, ratingList);
SaveNewWeights(newWeights, weightList);
}

private List(double) GetRatingList(Opportunity opportunity)
{
var ratingList = new List(double);
ratingList.Add(opportunity.Probability); ratingList.Add(opportunity.CompetitorRating);
ratingList.Add(opportunity.DemoRating);
ratingList.Add(opportunity.PainRating);
ratingList.Add(opportunity.ProductFitRating);
return ratingList;
}

private int GetPredictedOutcome (List(double) inputList, List(double) weightList)
{
if ((DotProduct(weightList, inputList)) == 0)
{
return 1;
}
return 0;
}

private double DotProduct(List(double) list1, List(double) list2)
{
return list1.Zip(list2, (d1, d2) => d1 * d2).Sum();
}

private List(double) Train(int actualOutcome, List(double) weightList, List(double)
inputList)
{
for (int i = 0; i < weightList.Count; i++)
{
var weight = weightList[i];
var input = inputList[i];
weightList[i] = (UpdateWeight(weight, actualOutcome,
GetPredictedOutcome(inputList, weightList), input));
}
return weightList;
}

private double UpdateWeight(double weight, int actualOutcome, int predictedOutcome,
double input)
{
return weight + _learnRate*(actualOutcome - predictedOutcome)*input;
}

I hope that was helpful! Please don’t hesitate to ask questions in the comments section below.

Chris has spent about half of his life in the South (mostly in Houston), but was born in and currently resides in St. Louis. Chris is interested in all manner of things, but at the top of the list are cool applications of statistics, well-written code, and Euro Truck Simulator 2. Chris has both a BSE and MBA from Washington University in St. Louis, where he learned to solve complex, ill-defined problems through appropriate framing and application of critical thinking skills.    When not at work, you can find Chris coding for various side projects or dominating trivia at his local pub.
  • Solid post, just a heads up- looks like there's a problem with your code parse/escaper thing though, the code box has html code mixed in with your c code. (putting it all together)

  • George Spofford

    Just a heads-up: the section that starts "Yikes! That seems intense. But hold on - let’s break it down." could be interpreted as making it sound like j is the training iteration when it's actually the training example index, and the t is not explained at all. If you're breaking down, may as well break it all down.