Stonewright.ai

Give Your Predictive Models a Lift with XGBoost

XGBoost (Extreme Gradient Boosting) is a machine learning algorithm that was first introduced in 2014. Engineers often choose it for its speed, accuracy, versatility, and ability to mitigate the risks of overfitting.

The popularity of XGBoost in Kaggle competitions began to rise around 2015 and has continued to grow ever since. It is now considered the go-to algorithm for many data scientists in competitions and in practical applications. The algorithm’s rapid growth in popularity was driven by its ability to produce highly accurate results across a wide range of problem domains.

There are several reasons why data scientists choose XGBoost. First, its calculations are efficient & scalable, making it suitable for large datasets. Very large XGBoost models can be trained in the cloud then deployed in lightweight versions for use in mobile devices. Most importantly, the heavy or light versions can deliver impressive accuracy in many types of tasks. This is partly possible though the careful tuning and testing of XGBoost’s many hyperparameter settings, which are adjusted to suit the nature of specific datasets.

The Gradient Boosting Framework

The name XGBoost comes from its use of a gradient boosting framework. Remember that XGBoosting uses an ensemble or collection of models to fit the problem space. Gradient boosting describes how it sequentially adds new models into the solution to improve upon the error rate of previous models. This can be thought of in a progressive or iterative manner where the model is automatically repeating and testing many calculation techniques prior to achieving its optimal state. It defines optimal as being the method that minimizes prediction error when measured against the training set. This sophisticated framework allows the XGBoost algorithm to adapt to large and complex datasets.

Similar to a random forest algorithm, XGBoost’s basic building block is the simple decision tree. But there are there are a number of differences between XGBoost and random forests, most important being the gradient boosting framework. Engineers also appreciate the detailed controls in an XGBoost algorithm, as it offers us greater control over a model’s complexity.

It’s worth noting how random forests and XGBoost algorithms both illustrate a powerful machine learning design principle: composability.

Composability describes how simple and reusable components or building blocks can be structured in a way that creates complex and sophisticated models or systems. By combining the relatively humble decision tree building block, ensemble tree based algorithms can create models that are capable of impressively powerful results.

Building an XGBoost Algorithm for Your Business?

Contact Stonewright and we’ll be happy to lend a hand!

About the author