For machine learning success, detail matters

By Yigit Yildirim, Ph.D., VP of Data Science, Emailage

Unless you’ve been in a bunker for the last couple of years, you can’t have failed to have noticed booming business interest in artificial intelligence (AI) and machine learning (ML). The unprecedented events of the first quarter of 2020 and the rapid transformation of business, healthcare and human engagement requires an even more urgent understanding of these technologies — and detail matters

Conceptual understanding of these technologies has reached beyond the initial AI mystique. Machine learning, a specific offshoot of AI, helps humans to process large amounts of high-dimensional data and make certain determinations based on historical patterns. Today we see discussions about how we can leverage AI to battle this current pandemic to deliver critical and personalized data into the hands of real people who can offer real, and viable solutions. 

In business, how to use AI/ML to scale processes, create efficiencies and improve outcomes has been a quest in recent years. But, implementation is not always so straightforward. Just like an algorithm thirsty for training data, organizations need to learn much more about ML in order to make the best deployment decisions. So, what do you need to know? 

WHY use machine learning?

Machine learning doesn’t simply enable automation – by which I mean, faster execution of tasks humans perform but with scale and speed. It can also excel where human expertise does not exist (say, navigating Mars), where the solution changes over time (like, you have now watched Tiger King and loved it, so what should Netflix recommend to you next?) or where the solution must adapt to particular bases (say, in the case of biometric security).

Use cases are infinitely broad – from ecommerce fraud detection and self-driving cars, to recommendation systems and software testing.  In machine learning, you rely on data as your domain expert, clean and relevant data is the key. The process creates complex knowledge from the data itself. 

What’s your flavor?

Not all machine learning is equal. Just as people learn different skills in different ways, so do algorithms. Critical to the success of your machine learning is deciding which flavor to use.

In “supervised” learning, computers use historic events and their outcomes as training data to positively define the process characteristics and predict expected outcomes in future data input. Use supervised learning models when you know the resolutions for the past events and would like to make determinations for the future events by relying on historical patterns and also need to do so at scale.

For example, in ecommerce fraud detection, you can have your model learn whether newly created emails are riskier, or IPs originating from a proxy server indicate higher fraudulent activity based on historical transactions carrying these properties if you know if they turned out to be fraudulent or legitimate. Then, we can repeatedly tell our ML model to look for those patterns in future data. We are effectively defining the boundaries a computer should use to come to certain descriptive conclusions.

“Unsupervised” learning, in one way, is the exact inverse. In that method, there’s no explicit ground truth as an output. Rather, we train a model to identify similarities in groups and look for outliers to these groups in certain cases. For instance, by analyzing attributes of images we might be able to associate attributes of different regions. Indeed, unsupervised learning depends on clustering – in other words, on negatively reducing extraneous dimensions until arriving at similarity.

Your choice of machine learning model should be influenced by the kind of task you are working to accomplish.. So think hard before you commit. 

Ensuring AI robustness

The decisions you take early on will dictate the likelihood of your technical and business success with ML.

  1. Design responsibly: Never has the computing phrase “Garbage In, Garbage Out” been more true than in ML. Learning algorithms can only learn the boundaries of the data they are given. Supplying dodgy data is dangerous – there are a lot of high-profile examples here with more than 60 million search results for “AI failures.” The primary takeaway from those search results is: machine learning knows no ethics, that’s your job.
  2. Ensure transparency: There’s a big debate raging over the necessity to explain AIs’ decisions. But you can already understand MLs’ decisions by actively weighting various criteria that trained them – and, crucially, making those weightings visible to users in output. AI need not be a black box.
  3. Monitor effectiveness: Your ML model should give consistent results for as long as the process that you model does not change. However, change is constant and it is of the utmost importance to detect process changes immediately and remedy accordingly. 

Adopt two essential principles: 

  1. proactively normalize your variables against seasonality 
  2. react immediately if the process changes completely. 

For example, the thresholds learned for velocity-based attributes might not hold in the holiday season. Normalize the variables to incorporate macro-signals and consider training new models for special occasions. 

Assemble your team

These are difficult concepts to grasp. In machine learning, significant efficiencies can be found. But applying predictive mathematical engines to solve problems requires the support of a talented team.

Find the ML champion in your team – someone who understands and can demonstrate and communicate the effectiveness of machine learning and of the considerations laid out above.

Add a data scientist who can make sense of the data itself, along with how to best interpret the results. And add a great engineer – the one who can make your system work at top capacity.

It’s worth finding an excellent designer and communicator to support these efforts too. Perhaps your ML champion, data scientist or engineer can fill this role, but if not, add someone with this skill set to your team to showcase what the outcomes mean and how your organization can use the intelligence.

After all, not all of your intelligence should be artificial.